LANGUAGES OF THE WORLD  Estimates of the number of languages spoken in the world today vary depending on where the dividing line between language and dialect is drawn. For instance, linguists disagree over whether Chinese should be considered a single language because of its speakers' shared cultural and literary tradition, or whether it should be considered several different languages because of the mutual unintelligibility of, for example, the Mandarin spoken in Beijing and the Cantonese spoken in Hong Kong (see Chinese Language). If mutual intelligibility is the basic criterion, current estimates indicate that there are about 6000 languages spoken in the world today. However, many languages with a smaller number of speakers are in danger of being replaced by languages with large numbers of speakers. In fact, some scholars believe that perhaps 90 percent of the languages spoken in the 1990s will be extinct or doomed to extinction by the end of the 21st century. The 12 most widely spoken languages, with approximate numbers of native speakers, are as follows: Mandarin Chinese, 836 million; Hindi, 333 million; Spanish, 332 million; English, 322 million; Bengali, 189 million; Arabic, 186 million; Russian, 170 million; Portuguese, 170 million; Japanese, 125 million; German, 98 million; French, 72 million; Malay, 50 million. If second-language speakers are included in these figures, English is the second most widely spoken language, with 418 million speakers. See also Indian Languages.

A Language Classification  Linguists classify languages using two main classification systems: typological and genetic. A typological classification system organizes languages according to the similarities and differences in their structures. Languages that share the same structure belong to the same type, while languages with different structures belong to different types. For example, despite the great differences between the two languages in other respects, Mandarin Chinese and English belong to the same type, grouped by word-order typology. Both languages have a basic word order of subject-verb-object.

A genetic classification of languages divides them into families on the basis of their historical development: A group of languages that descend historically from the same common ancestor form a language family. For example, the Romance languages form a language family because they all descended from the Latin language. Latin, in turn, belongs to a larger language family, Indo-European, the ancestor language of which is called Proto-Indo-European. Some genetic groupings are universally accepted. However, because documents attesting to the form of most ancestor languages, including Proto-Indo-European, have not survived, much controversy surrounds the more wide-ranging genetic groupings. A conservative survey of the world's language families follows.

B Indo-European Language Family  The Indo-European languages are the most widely spoken languages in Europe, and they also extend into western and southern Asia. The family consists of a number of subfamilies or branches (groups of languages that descended from a common ancestor, which in turn is a member of a larger group of languages that descended from a common ancestor). Most of the people in northwestern Europe speak Germanic languages, which include English, German, and Dutch as well as the Scandinavian languages, such as Danish, Norwegian, and Swedish. The Celtic languages, such as Welsh and Gaelic, once covered a large part of Europe but are now restricted to its western fringes. The Romance languages, all descended from Latin, are the only survivors of a somewhat more extensive family, Italic, which includes, in addition to Latin, a number of now extinct languages of Italy (see Italic Languages). Languages of the Baltic and Slavic (Slavonic) branches are closely related. Only two of the Baltic languages survive: Lithuanian and Latvian. The Slavic languages, which cover much of eastern and central Europe, include Russian, Ukrainian, Polish, Czech, Serbo-Croatian, and Bulgarian. In the Balkan Peninsula, two branches of Indo-European exist that each consist of a single languagenamely the Greek language and the Albanian language. Farther east, in Caucasia, the Armenian language constitutes another single-language branch of Indo-European.

The other main surviving branch of the Indo-European family is Indo-Iranian (see Indo-Iranian Languages). It has two subbranches, Iranian and Indo-Aryan (Indic). Iranian languages are spoken mainly in southwestern Asia and include Persian, Pashto (spoken in Afghanistan), and Kurdish. Indo-Aryan languages are spoken in the northern part of South Asia (Pakistan, northern India, Nepal, and Bangladesh) and also in most of Sri Lanka (see Indian Languages). This branch includes Hindi-Urdu, Bengali, Nepali, and Sinhalese (the language spoken by the majority of people in Sri Lanka). Historical documents attest to other, now extinct, branches of Indo-European, such as the Anatolian languages, which were once spoken in what is now Turkey and include the ancient Hittite language.

C Other European Language Families  The Uralic languages constitute the other main language family of Europe. They are spoken mostly in the northeastern part of the continent, spilling over into northwestern Asia; one language, Hungarian, is spoken in central Europe. Most Uralic languages belong to the family's Finno-Ugric branch (see Finno-Ugric Languages). This branch includes (in addition to Hungarian) Finnish, Estonian, and Saami. Europe also has one language isolate (a language not known to be related to any other language): Basque, which is spoken in the Pyrenees. At the boundary between southeastern Europe and Asia lie the Caucasus Mountains. Since ancient times the region has contained a large number of languages, including two groups of languages that have not been definitively related to any other language families. The South Caucasian, or Kartvelian, languages are spoken in Georgia and include the Georgian language. The North Caucasian languages fall into North-West Caucasian, North-Central Caucasian, and North-East Caucasian subgroups. The genetic relation of North-West Caucasian to the other subgroups is not universally agreed upon. The North-West Caucasian languages include Abkhaz, the North-Central Caucasian languages include Chechen, and the North-East Caucasian languages include the Avar language (see Caucasian Languages).

D Asian and Pacific Language Families  South Asia contains, in addition to the Indo-Aryan branch of Indo-European, two other large language families. The Dravidian family is dominant in southern India and includes Tamil and Telugu. The Munda languages represent the Austro-Asiatic language family in India and contain many languages, each with relatively small numbers of speakers. The Austro-Asiatic family also spreads into Southeast Asia, where it includes the Khmer (Cambodian) and Vietnamese languages (see Austro-Asiatic Languages). South Asia contains at least one language isolate, Burushaski, spoken in a remote part of northern Pakistan. See also Indian Languages.

A number of linguists believe that many of the languages of central, northern, and eastern Asia form a single Altaic language family, although others consider Turkic, Tungusic, and Mongolic to be separate, unrelated language families (see Altaic Languages). The Turkic languages include Turkish and a number of languages of the former Union of Soviet Socialist Republics (USSR), such as Uzbek and Tatar. The Tungusic languages are spoken mainly by small population groups in Siberia and Northeast China. This family includes the nearly extinct Manchu language. The main language of the Mongolic family is Mongolian. Some linguists also assign Korean and Japanese to the Altaic family, although others regard these languages as isolates. In northern Asia there are a number of languages that appear either to form small, independent families or to be language isolates, such as the Chukotko-Kamchatkan language family of the Chukot and Kamchatka peninsulas in the far east of Russia. These languages are often referred to collectively as Paleo-Siberian (Paleo-Asiatic), but this is a geographic, not a genetic, grouping.

The Sino-Tibetan language family covers not only most of China, but also much of the Himalayas and parts of Southeast Asia (see Sino-Tibetan Languages). The family's major languages are Chinese, Tibetan, and Burmese. The Tai languages constitute another important language family of Southeast Asia. They are spoken in Thailand, Laos, and southern China and include the Thai language. The Miao-Yao, or Hmong-Mien, languages are spoken in isolated areas of southern China and northern Southeast Asia. The Austronesian languages, formerly called Malayo-Polynesian, cover the Malay Peninsula and most islands to the southeast of Asia and are spoken as far west as Madagascar and throughout the Pacific islands as far east as Easter Island. The Austronesian languages include Malay (called Bahasa Malaysia in Malaysia, and Bahasa Indonesia in Indonesia), Javanese, Hawaiian, and Maori (the language of the aboriginal people of New Zealand).

Although the inhabitants of some of the coastal areas and offshore islands of New Guinea speak Austronesian languages, most of the main island's inhabitants, as well as some inhabitants of nearby islands, speak languages unrelated to Austronesian. Linguists collectively refer to these languages as Papuan languages, although this is a geographical term covering about 60 different language families. The languages of the Australian Aborigines constitute another unrelated group, and it is debatable whether all Australian languages form a single family (see Australia).

E African Language Families  The languages of Africa may belong to as few as four families: Afro-Asiatic, Nilo-Saharan, Niger-Congo, and Khoisan, although the genetic unity of Nilo-Saharan and Khoisan is still disputed (see African Languages). Afro-Asiatic languages occupy most of North Africa and also large parts of southwestern Asia. The family consists of several branches. The Semitic branch includes Arabic, Hebrew, and many languages of Ethiopia and Eritrea, including Amharic, the dominant language of Ethiopia (see Semitic Languages). The Chadic branch, spoken mainly in northern Nigeria and adjacent areas, includes Hausa, one of the two most widely spoken languages of sub-Saharan Africa (the other being Swahili). Other subfamilies of Afro-Asiatic are Berber, Cushitic, and the single-language branch Egyptian, which contains the now-extinct language of the ancient Egyptians (see Egyptian Language; Coptic Language).

The Niger-Congo family covers most of sub-Saharan Africa and includes such widely spoken West African languages as Yoruba and Fulfulde, as well as the Bantu languages of eastern and southern Africa, which include Swahili and Zulu. The Nilo-Saharan languages are spoken mainly in eastern Africa, in an area between those covered by the Afro-Asiatic and the Niger-Congo languages. The best-known Nilo-Saharan language is Masai, spoken by the Masai people in Kenya and Tanzania. The Khoisan languages are spoken in the southwestern corner of Africa and include the Nama language (formerly called Hottentot).

F Language Families of the Americas  Some linguists group all indigenous languages of the Americas into just three families, while most separate them into a large number of families and isolates. Well-established families include Eskimo-Aleut. The family stretches from the eastern edge of Siberia to the Aleutian Islands, and across Alaska and northern Canada to Greenland, where one variety of the Inuit language, Greenlandic, is an official language. The Na-Den languages, the main branch of which comprises the Athapaskan languages, occupies much of northwestern North America. The Athapaskan languages also include, however, a group of languages in the southwestern United States, one of which is Navajo. Languages of the Algonquian and Iroquoian families constitute the major indigenous languages of northeastern North America, while the Siouan family is one of the main families of central North America.

The Uto-Aztecan family extends from the southwestern United States into Central America and includes Nahuatl, the language of the Aztec civilization and its modern descendants (see Aztec Empire). The Mayan languages are spoken mainly in southern Mexico and Guatemala (see Maya). Major language families of South America include Carib and Arawak in the north, and Macro-G and Tupian in the east. Guaran, recognized as a national language in Paraguay alongside the official language, Spanish, is an important member of the Tupian family. In the Andes Mountains region, the dominant indigenous languages are Quechua and Aymara; the genetic relation of these languages to each other and to other languages remains controversial. See also Native American Languages.

G Pidgin and Creole Languages  Individual pidgin and creole languages pose a particular problem for genetic classification because the vocabulary and grammar of each comes from different sources. Consequently, many linguists do not try to classify them genetically. Pidgin and creole languages are found in many parts of the world, but there are particular concentrations in the Caribbean, West Africa, and the islands of the Indian Ocean and the South Pacific. English-based creoles such as Jamaican Creole and Guyanese Creole, and French-based creoles such as Haitian Creole, can be found in the Caribbean. English-based creoles are widespread in West Africa. About 10 percent of the population of Sierra Leone speaks Krio as a native language, and an additional 85 percent speaks it as a second language. The creoles of the Indian Ocean islands, such as Mauritius, are French-based. An English-based pidgin, Tok Pisin, is spoken by more than 2 million people in Papua New Guinea, making it the most widely spoken auxiliary language of that country. The inhabitants of Solomon Islands and Vanuatu speak similar varieties of Tok Pisin, called Pijin and Bislama, respectively.

H International Languages  International languages include both existing languages that have become international means of communication and languages artificially constructed to serve this purpose. The most famous and widespread artificial international language is Esperanto; however, the most widespread international languages are not artificial. In medieval Europe, Latin was the principal international language. Today, English is used in more countries as an official language or as the main means of international communication than any other language. French is the second most widely used language, largely due to the substantial number of African countries with French as their official language. Other languages have more restricted regional use, such as Spanish in Spain and Latin America, Arabic in the Middle East, and Russian in the republics of the former USSR.