The sci.lang FAQ: 9 - 14

9 Why do Hebrew and Yiddish, Japanese and Chinese, Persian and Arabic look so much alike if they aren't related?

[Previous] [Next] [Index]

Distinguish language from writing system. In each of these cases one language has adopted part or all of the writing system of an unrelated language.

(To a Chinese person, English and Finnish look alike, because they're written in the same alphabet. Yet they are not historically related.)

An excellent introduction to writing systems is Geoffrey Sampson's Writing Systems (1985). The authoritative (but expensive) reference is Daniels and Bright's The World's Writing Systems (1996), which discusses every known script.

10 How do linguists decide that languages are related?

[Previous] [Next] [Index]


When linguists say that languages are related, they're not just remarking on their surface similarity; they're making a technical statement or claim about their history-- namely, that they can be regularly derived from a common parent language.

Proto-languages are reconstructed using the comparative method. The first stage is to inspect and compare large amounts of vocabulary from the languages in question. Where possible we compare entire paradigms (sets of related forms, such as the those of the present active indicative in Latin), rather than individual words.

The inspection should yield a set of regular sound correspondences between the languages. By regular, we mean that the same correspondences are consistently observed in identical phonetic environments. Finally, sound changes are formulated: language-specific rules which specify how the original common form changed in order to produce those observed in each descendent language.

Applying the comparative method to the Romance languages, we might find

'I sense' Sard /sento/ French /sa~/ Italian /sento/ Spanish /sjEnto/
'sleep' /sonnu/ /som/ /sonno/ /suEn^o/
'hundred' /kentu/ /sa~/ /tSento/ /sjEnto/
'five' /kimbe/ /sE~k/ /tSinkwe/ /sinko/
'I run' /kurro/ /kur/ /korro/ /korro/
'story' /kontu/ /ko~t@/ /(rak)konto/ /kuEnto/

and hundreds of similar examples. We see some correspondences--

(1) Sard /s/ French /s/ Italian /s/ Spanish /s/
(2) /k/ /s/ /tS/ /s/
(3) /k/ /k/ /k/ /k/

but they seem to conflict: does Sard /k/ correspond to Spanish /s/ or /k/? Does French /s/ correspond to Italian /s/ or /tS/?

In fact we will find that the correspondences are regular, once we observe that (2) is seen before a front vowel (i or e), while (3) is seen in other environments. Alternations within paradigms, such as It. /diko/ 'I say' vs. /ditSe/ 'says', will help us make and confirm such generalizations.

We may interpret these now-regular correspondences as indicating that an initial /s/ in the proto-language has been retained in all four languages, and likewise initial /k/ in Sard; but that /k/ changed to /s/ or /tS/ in the other languages in the environment of a front vowel.

Actually, this process is iterative. For instance, at first glance we might think that German haben and Latin habere 'have' are obvious cognates. However, after noting the regular correspondence of German h to Latin c, we are forced to change our minds, and look to capere 'seize' as a better cognate for haben.

Thus, similarity of words is only a clue, and perhaps a misleading one. Linguists conclude languages are related, and thus derive from a common ancestor, only if they find regular sound correspondences between them.

To complicate things, derivations may be obscured by irregular changes, such as dissimilation, borrowing, or analogical change. For instance, the normal development of Middle English kyn is 'kine', but this word has been largely replaced by 'cows', formed from 'cow' (ME cou) on the analogy of word-pairs like stone : stones. Analogy often serves to reduce irregularities in a language (here, an unusual plural).

Borrowing refers to taking words from other languages, as English has taken 'search' and 'garage' from French, 'paternal' from Latin, 'anger' from Old Norse, and 'tomato' from Nahuatl. How do we know that English doesn't derive from French or Nahuatl? The latter case is easy to eliminate: regular sound correspondences can't be set up between English and Nahuatl.

But English has borrowed so heavily from French that regular correspondences do occur. Here, however, we find that the French borrowings are thickest in government, legal, and military domains; while the basic vocabulary (which languages borrow less frequently) is more akin to German. Paradigmatic correspondences like sing/sang/sung vs. singen/sang/gesungen also help show that the Germanic words are inherited, the French ones borrowed.

If you want more, Theodora Bynon's Historical Linguistics (1977) is very good, and not long; R.L. Trask's Historical Linguistics (1996) is very readable and covers more recent developments. Anthony Fox's Linguistic Reconstruction: An Introduction to Theory and Method (1995) concentrates on the reconstruction process itself, and assumes some knowledge of linguistics.

11 What is Noam Chomsky's transformational grammar all about?

[Previous] [Next] [Index]

Several things; it really comprises several layers of theory:

(1) The hypothesis that much of the structure of human language is inborn ('built-in') in the human brain, so that a baby learning to talk only has to learn the vocabulary and the structural 'parameters' of his native language -- he doesn't have to learn how language works from scratch.

The main evidence consists of:

This theory is by no means accepted by all linguists, though many would agree that some core part of language is innate.

(2) The hypothesis that to adequately describe the grammar of a human language, you have to give each sentence at least two different structures, called deep structure and surface structure, together with rules called transformations that relate them.

This is hotly debated. Some theories of grammar use two levels and some don't. Chomsky's original monograph, Syntactic Structures (1957), is still well worth reading; this is what it deals with.

(3) Chomsky's name is associated with specific flavors of transformational grammar. The model elaborated over the last few years is called GB (government and binding) theory, which however has been heavily modified by the approach described in the recent The Minimalist Program (1995).

(4) Some people think Chomsky is the source of the idea that grammar ought to be viewed with mathematical precision. (Thus there are occasional vehement anti-Chomsky polemics such as The New Grammarian's Funeral, which are really polemics against grammar per se.)

Although Chomsky contributed some valuable techniques, grammarians have always believed that grammar was a precise, mechanical thing. They are highly divided, however, on the nature and function of those mechanisms!

12 What is a dialect?

[Previous] [Next] [Index]

[--M.C. + M.R.]

A dialect is any variety of a language spoken by a specific community of people. Most languages have many dialects.

Everyone speaks a dialect. In fact everyone speaks an idiolect, i.e., a personal language. (Your English language is not quite the same as my English language, though they are probably very, very close.)

A group of people with very similar idiolects are considered to be speaking the same dialect. Some dialects, such as Standard American English, are taught in schools and used widely around the world. Others are very localized.

Localized or uneducated dialects are not merely failed attempts to speak the standard language. William Labov and others have demonstrated, for example, that the speech of inner-city blacks has its own intricate grammar, quite different in some ways from that of Standard English.

It should be emphasized that linguists do not consider some dialects superior to others-- though speakers of the language may do so; and linguists do study people's attitudes toward language, since these have a strong effect on the development of language.

Linguists call varieties of language dialects if the speakers can understand each other and languages if they can't. For example, Irish English and Southern American English are dialects of English, but English and German are different languages (though related).

This criterion is not always as easy to apply as it sounds. Intelligibility may vary with familiarity and interest, or may depend on the subject. A more serious problem is the dialect continuum: a chain of dialects such that any two adjoining dialects are mutually intelligible, but the dialects at the ends are not. Speakers of Belgian Dutch, for instance, can't understand Swiss German, but between them there lies a continuum of mutually intelligible dialects.

Sometimes the use of the terms 'language' or 'dialect' is politically motivated. Norwegian and Danish (being mutually intelligible) are dialects of the same language, but are considered separate languages because of their political independence. By contrast, Mandarin and Cantonese, which are mutually unintelligible, are often referred to as 'dialects' of Chinese, due to the political and cultural unity of China, and because they share a common written language.

At this point we usually quote Max Weinreich: "A language is a dialect with an army and a navy."

Because of such problems, some linguists reject the mutual intelligibility criterion; but they do not propose to return to arguments on political and cultural grounds. Instead, they prefer not to speak of dialects and languages at all, but only of different varieties, with varying degrees of mutual intelligibility.

13 Are all languages equally complex, or are some more primitive than others?

[Previous] [Next] [Index]

[--M.C. + M.R.]

Before the 1900s many people believed that so-called 'primitive peoples' would have primitive languages, and that Latin and Greek-- or their own languages-- were inherently superior to other tongues.

In fact, however, there is no correlation between type or complexity of culture and any measure of language complexity. Peoples of very simple material culture, such as the Australian Aborigines, are often found to speak very complex languages.

Obviously, the size of the vocabulary and the variety and sophistication of literary forms will depend on the culture. The grammar of all languages, however, tends to be about equally complex-- although the complexity may be found in different places. Latin, for instance, has a much richer system of inflections than English, but a less complicated syntax.

As David Crystal puts it, "All languages meet the social and psychological needs of their speakers, are equally deserving of scientific study, and can provide us with valuable information about human nature and society."

There are only two cases of really simple languages.

14 What about artificial languages, such as Esperanto?

[Previous] [Next] [Index]


Hundreds of constructed languages have been devised in the last few centuries. Early proposals, such as those of Lodwick (1647), Wilkins (1668), or Leibniz (1768), were attempts to devise an ideal language based on philosophical classification of concepts, and used wholly invented words. Most were too complex to learn, but one, Jean Francois Sudre's Solresol (1866), achieved some popularity in the last century; its entire vocabulary was built from the names of the notes of the musical scale, and could be sung as well as spoken.

Later the focus shifted to languages based on existing languages, with a polyglot (usually European) vocabulary and a simplified grammar, whose purpose was to facilitate international communication. Johann Schleyer's Volapük (1880) was the first to achieve success; its name is based on English ('world-speech'), and reflects Schleyer's notions of phonetic simplicity.

It was soon eclipsed by Ludwig Zamenhof's Esperanto (1887), whose grammar was simpler and its vocabulary more recognizable. Esperanto has remained the most successful and best-known artificial language, with a million or more speakers and a voluminous literature; children of Esperantists have even learned it as a native language.

Its relative success hasn't prevented the appearance of new proposals, such as Ido (1907), Interlingua (1951), Occidental (1922), and Novial (1927). There have also been attempts to simplify Latin (Latino Sine Flexione, 1903) and English (Basic English, 1930) for international use. The recent Loglan (1960) and Lojban (1988), based on predicate logic, may represent a revival of a priori language construction.

See also Andrew Large, The Artificial Language Movement (1985); Mario Pei, One Language For The World (1958); Detlev Blanke, Internationale Plansprachen (in German); Pierre Janton, L'Espéranto (French, 1973) Esperanto (English, 1994).

There is a newsgroup, soc.culture.esperanto, dedicated to Esperanto. The newsgroup alt.language.artificial discusses artificial languages in general.

The ConLang mailing list is devoted to the discussion of constructed and artificial languages for general communication; its FAQ is on the web. To subscribe, e-mail a message to consisting of the single line subscribe conlang.

The AuxLang list is devoted to discussions of the merits and practicality of particular international auxiliary languages. To subscribe, send mail to with the line subscribe auxlang.

If you're interested in creating your own language, check out my Language Construction Kit.

[Previous file] [Next file] [Index]