The Language Construction Kit

[Back to Metaverse]


Grammar

[back to outline]

Once you've bundled together some words and perhaps an alphabet, you may think you're done. If you do, it's likely that you've just created an elaborate cipher for English. You still have the grammar to do, bucko.

This section doesn't attempt to cover all the issues in morphology, syntax, and pragmatics. Instead, it suggests what your grammar should minimally do, mentions some of the issues, and lists some interesting approaches taken by various languages.


Is your language inflecting, agglutinating, or isolating?

[back to outline]

Inflections are of course affixes used to conjugate verbs and decline nouns. Examples from English are the -s we add to verbs for the 3rd person present form, the -s added to pluralize nouns, and the -ed of the past tense. Languages such as Russian or Latin have complex, not to say baroque, inflectional systems.

A single inflection may encode multiple meanings. For instance, in the Russian form domóv, the -óv ending indicates both plurality and the genitive case; it doesn't bear any evident relationship with other plural endings (e.g. nominative ) or the singular genitive ending (-a). In Spanish comí 'I ate', the ending indicates the 1st person singular, past tense, indicative mood-- quite a job for one vowel, even accented.

In agglutinating languages, one affix has one meaning. Compare Quechua wasikunapi 'in the houses'; the plural suffix -kuna is separate from the case suffix -pi. Or mikurani 'I ate', in which the past tense suffix -ra- is kept separate from the personal ending -ni.

In isolating languages, there are no suffixes at all; meanings are modified by inserting additional words. In Chinese, for instance, wô chi fàn could mean 'I eat' or 'I was eating', depending on the context; the verb is not inflected at all. For precision, adverbs can be brought in: wô chi fàn zuótiàn 'I was eating yesterday'.

(In practice natural languages are all a bit mixed; some inflections have a single meaning; Quechua does have a few inflections, for instance, and Chinese does have required grammatical particles, such as the aspect particle le, used to show completed action: wô chi fàn le 'I ate.')

Conlang creators seem to gravitate toward agglutinating or isolating languages; but there's something to be said for inflections. They tend to be compact, for instance. You can't beat for succintness.


Do you have nouns, verbs, and adjectives?

[back to outline]

Why not get rid of one or two of them?

It's not hard to get rid of adjectives. One easy way is to treat them as verbs: instead of saying "The wall is red", you say "The wall reds"; likewise, instead of "the red wall" you say "the redding wall".

With such tricks you can even get rid of the verb be, which according to some theorists is responsible for most of the sloppy thinking in the world today. (Heinlein was careful to ban 'to be' from Speedtalk.) About the only response this notion deserves is: would that clear thinking was that easy.

You can extend the idea to get rid of nouns. For instance, in Lakhota, ethnic names are verbs, not nouns. There's a verb 'to be a Lakhota': the present forms mean 'I am a Lakhota, you are a Lakhota, etc.'

You can have some fun with this. "The rock is under the tree" could be expressed as something like "There is stonying below the growing, greening, flourishing",or perhaps "It stones whileunder it grows greeningly." If we really encountered a language like this, however, I'd have to wonder whether we weren't just fooling ourselves. If there's a word that refers to stones, why translate it as 'to stone' rather than simply 'stone'?

Jorge Luis Borges, in "Tlön, Uqbar, Tertius Orbis", posits a language without nouns; but this was because its speakers were Berkeleyan idealists, who didn't believe in object permanence. However, linguists really do not like using semantic classes-- or metaphysics-- to define syntactic categories. (It's not the right level of analysis; and it tends to obscure how languages really work by making them all look like Latin.)

Jack Vance (in The Languages of Pao) posited a language without verbs. For instance, "There are two matters I wish to discuss with you" comes out something like "Statement-of-importance -- in-a-state-of-readiness-- two; ear-- of [place name]-- in-a-state-of-readiness; mouth-- of this person here-- in-a-state-of-volition." Vance may be in a state of pulling our legs.


How do you indicate plural, case, and gender forms of adjectives and nouns?

[back to outline]

What's case? It's a way of marking nouns by function: e.g. Latin

mundus   subject or nominative: the world (is, does, ...)

mundum object or accusative: (something affects) the world
munde vocative: O world!
mundi possessive or genitive: the world's
mundo indirect object or dative: (given, sold, etc.) to the world
mundo ablative: (something is done) by the world

English actually has cases: possessives like 'world's' are actually genitive case forms; while the subject/object distinction is made with pronouns (I vs. me, we vs. us).

Conlang enthusiasts generally either love case (because it makes a language compact and frees up word order) or hate it (because English doesn't do much with it).

Some languages, such as Basque, have a different arrangement of cases. Instead of the subject of the sentence always being in the same case (the nominative), the subject of intransitive sentences (e.g. "The window broke") and the object of transitive sentences (e.g. "I broke the window) are in the same case, the absolutive, while the subjects of transitive sentences (e.g. "I broke the window") are in the ergative case.

If you think that's weird, a few languages, such as Dyirbal, use the nominative/accusative system for 1st and 2nd person pronouns (I, we, you), and the ergative/absolutive system for nouns and for 3rd person pronouns.

If a language doesn't have case it may rely on word order to indicate the relationship between a verb's arguments; but there is another alternative: head-marking on the verb. For instance, in the Swahili Kitabu umekileta? 'Did you bring the book?', the verb leta has prefixes indicating the subject (u- 'you') and the object (-ki-, a third person prefix agreeing in gender with kitabu). (-me marks the perfect tense.) The gender-specific object marker on the verb allows free word order even without case marking on the nouns.


Do nouns have gender?

[back to outline]

Note that gender need not be simply masculine/feminine. Swahili, for instance, has eight gender classes, none of them masculine/feminine: one is for animals, one for human beings, one for abstract nouns, one forms diminutives, etc.

I daresay not many conlangs have grammatical gender. (Verdurian has it, because it's intended to be naturalistic.) People ask, what is gender for? Gender is remarkably persistent: it's persisted in the Indo-European, Semitic, and Bantu language families for at least five thousand years. It must be doing something useful.

A few possibilities:


Does the verb inflect by person, gender, and/or number?

[back to outline]

Like case, personal endings make for nice compact sentences, since if you have them you can generally omit subject pronouns.

Some languages, such as Swahili and Quechua, include the object pronoun in the verb as well, usually as an infix.

The Romance languages have clitic forms of the pronouns, which stop just short of being verb inflections: e.g. French Je le vois, 'I see him'; Spanish Digame, 'Tell me'.

Basque verbs can inflect to encode information about the listener. For instance, ekarri digute is a neutral way of saying 'They brought it to us'; ekarri zigunate means the same, but also indicates that the listener is a woman addressed with the informal personal pronoun.


What distinctions are made in the verb?

[back to outline]

Some distinctions languages make:

Any language can express these distinctions, but they differ in which features are grammaticalized: reflected in the morphology and syntax of the language. English, for instance, grammaticalizes person and number in its verbal system, while Japanese does not. On the other hand Japanese verbs have positive and negative forms, as well as a morphological indication of levels of deference.

Languages also differ in how many distinctions are made in these categories.


What are the personal pronouns?

[back to outline]

The basic, universal persons are first (referring to the speaker), second (the hearer), and third (everybody else). However, there's lots of room to play around. Distinctions may be made:

I invented an alien race once that used different pronouns on land and underwater (they were amphibians), and had the inclusive/exclusive and proximate/obviative distinctions. They also had a pronoun for group minds, and pronouns for each of their three sexes. The complete list was impressive.


What are the other pronouns?

[back to outline]

To me, the best idea Zamenhof had was his table of correlatives, a nice way to organize all these pronouns. For English, it looks like this:


QUERY THIS THAT SOME NO EVERY
ADJECTIVE which this that some no every
PERSON who this that someone no one everyone
THING what this that something nothing everything
PLACE where here there somewhere nowhere everywhere
TIME when now then sometime never always
WAY how thus somehow
REASON why

It's easy and diverting to regularize the table, although natural languages generally leave holes, which must be filled in with phrases ('in that way', 'for no reason').

You might ask yourself whether the interrogative pronouns ("Who did it?") and the relative pronouns ("Is this the man who did it?") are the same; in some languages they aren't.

Generally, if nouns decline, these pronouns decline the same way. Sometimes they're worse-- English, for instance, retained separate 'from' and 'to' forms for pronouns of place (here / hence = from here / hither = to here) long after such distinctions were lost for ordinary nouns.


What are the numbers?

[back to outline]

Are the numbers based on tens, or something else? Many human number systems are based on fives instead. My pronoun-happy aliens had a duodecimal system. Intelligent machines would surely prefer hexadecimal...

How do you form higher numbers? 'Forty-three', for instance, may be formed in several ways:
forty three
four three
forty with three
three and forty
four tens and three
eight fives and three
fifty less seven
twice twenty and three

Where nouns decline, numbers may also. Or they may not. In Latin, you stop declining the numbers at four.

In Indo-European languages we are used to unanalyzable roots for the numbers; but in other families number names are derivations, often related to the process of counting on fingers and toes-- e.g. Choctaw 5 = tahlapi 'the first (hand) finished'; Klamath 8 ndan-ksahpta 'three I have bent over'; Unalit 11 atkahakhtok 'it goes down (to the feet)'; Shasta 20 tsec 'man' (considered as having 20 countable appendages).

For more on numbers, see the Sources page of my Numbers from 1 to 10 in Over 2000 Languages page.


What about adjectives?

[back to outline]

Adjectives can be something like nouns, something like verbs, or like neither. If they're like nouns, they generally agree with their head noun in gender, case, and number. If they're like verbs, they conjugate like verbs.

How are comparative expressions ("holier than thou", "most holy", "as holy as thou") formed?

It's useful to have some regular derivations for or from adjectives:
opposite (un-)
lack (-less) or surfeit (-ful)
possibility (-able)
liking (-phile) or disliking (-phobe)
inhabitant (-er, -ian, -an, -ese)
weakening of meaning (-ish)
strengthening of meaning (to the max)
adverb (-ly)


Are there articles (a, the)?

[back to outline]

Many languages, such as Latin and Russian, get by quite happily without them.

It may help to understand what the distinction really means. Ordinarily it's pragmatic: the can be paraphrased 'You know which one I'm talking about'. Consider:

I saw a man at the rodeo. The man had on a horrid plaid suit.
A man in the first sentence signals that this character is being introduced in this conversation; the in the second sentence signals that he's old news, he is in fact the same guy we just started talking about. The before rodeo also indicates that the speaker expects that the hearer can figure out which rodeo-- if not, he'd have said a rodeo.

Word order serves the same function in Russian. There you'd say, in effect,

I saw man in rodeo. Man wore horrid plaid suit.
When he's introduced, the man lives near the end of the sentence; when he's old news, he appears at the front.

(Actually, they don't have many rodeos in Russia.)


What order do the various components of a noun phrase appear in?

[back to outline]

Consider articles, numbers, quantifiers, adverbs, adjectives, possessives, subordinate clauses-- e.g.

The ten very happy robots who passed the bar exam

You can generally divide phrases into heads and modifiers. Some languages are very consistent about placing all modifiers before, or all after the head. English is head-final, with the exception of subordinate clauses. Japanese is head-final too, but it's more consistent: it would say "the bar exam passed robots".


What order do the various components of a sentence appear in?

[back to outline]

Linguists like to talk about the order of subject, object, and verb, which of course can occur in just six combinations: SVO (as in English or Swahili), SOV (Latin, Quechua, Turkish), VSO (Welsh), OVS (Hixkaryana), OSV (Apurinã), VOS (Malagasy). The last three are for some reason rare, although they do exist.

Combinations and complications are common; for instance, German is basically SOV, but a finite verb (anything but a participle or an infinitive) appears after the subject in a main clause:

Mein Vater ist vor einigen Tagen nach London gefahren.
My father has several days ago to London travelled.

(German isn't usually described this way; but my way is equally correct, and requires only one exception. The usual approach requires two exceptions, one for nonfinite verbs in the main clause, one for subclauses.)


How do you form a relative clause (the man who...)?

[back to outline]

It can be useful to think about relative clauses using transformational grammar. For instance, a sentence like

The man that John hit yesterday prefers beer to wine.
can be seen as deriving by transformation from one sentence that's embedded in another:
The man [John hit him yesterday] prefers beer to wine.

In English, you can think of relativization as proceeding in two steps: a) replacing the pronoun in the subclause with an interrogative pronoun (or that)

The man [John hit whom yesterday] prefers beer to wine.
and b) moving that pronoun to the head of the clause.
The man [whom John hit yesterday] prefers beer to wine.

Your language may also put limits on what exactly can be relativized. The following examples are legal in English, for instance, but not in certain other languages.

the girl [you think [I love her]
>> the girl you think I love
the neighbor [I traumatized his pastor]
>> the neighbor whose pastor I traumatized
the cat [I said [Alesia brought it home]]
>> the cat that I said Alesia brought home

Not everything is possible in English:

This is the man [my girlfriend's father is a friend of John and him]
>> This is the man that my girlfriend's father is a friend of John and.
or (thanks to Leo Connolly for this example)
There's the barn [more people have gotten drunk down in back of it than any other barn in the county]
>> There's the barn that more people have gotten drunk down in back of than any other barn in the county.

Some languages can handle such sentences simply by leaving the pronoun in the subclause. S.J. Perelman liked to do this in English:

"That's the man which my wife is sleeping with him!"

If your language has cases, you must be careful to put the pronouns in the right case-- English doesn't give you the right instincts here, now that whom is used only by pedants like me. Generally the proper case to use is the one that would be appropriate in the subclause. In The cat that I said Alesia brought home, for instance, the that representing the cat should be in the case appropriate for the cat in Alesia brought the cat home.

Quechua has an interesting way of forming clauses, using participles. For instance:

Chakra-y yapu-q runa-ta qaya-mu-saq
field-my plow-participle man-accusative call-[movement-toward]-[I-future]
I'll call the man that plowed my field.
The subclause has, rather than the form of an ordinary sentence ("the man plowed my field") the form of a participle ("the my-field-plowing man").

How do you form yes-no questions?

[back to outline]

English has a rather baroque procedure (inverting subject and verb). Other languages simply make use of a rise in intonation, or add a particle at the beginning of the sentence (e.g. Polish czy) or to the verb.

Many languages offer ways of suggesting the answer to the question. For instance, the Latin particle num expects the answer 'no' (Num ursi cerevisiam imperant? Bears don't order beer, do they?), while nonne expects 'yes' (Nonne ursus animal implume bipes? Bears are featherless bipeds, aren't they?).

Where questions are formed by appending a particle (e.g. -ne in Latin, or -chu in Quechua), the particle can be added directly to the word being questioned. We can only achieve the same effect in English by emphasis (Is the bear drinking beer? Is the bear drinking beer?) or by rearrangement (Is it beer that the bear is drinking?).

One way of asking a quesion in Chinese is to offer the listener a choice: Nî shì bu shì Bêijing rén? "You're from Beijing?", literally "You be, not be from Beijing?"

Some folks, believe it or not, get by without having words for 'yes' or 'no'. The usual workaround is repeat the verb from the question: "Do you know the way to San José?" can be answered "I know" or "I don't know", as in Portuguese:

--Você conhece o caminho que vai a São José?
--Conheço.
['I know']

How about other questions?

[back to outline]

English usually moves the question word to the beginning of the sentence, but other languages don't, asking in effect "You said what?" or "She's going out with whose boyfriend?"

Also note that some languages have different pronouns for relative clauses ("The man who fishes") and questions ("Who is this man?").


How do you negate a sentence?

[back to outline]

Again, there are many options:


How do conjunctions work?

[back to outline]

Latin has a neat trick: to express X and Y, you can say X Y-que, using a clitic. The expression SPQR, Senatus Populusque Romae, is an example of this construction: the Senate and the People of Rome.

Latin also distinguishes inclusive and exclusive or: vel X vel Y means that you can have X or Y or both, but aut X aut Y means you get one or the other but not both.

Quechua (before the Spanish conquest) got by without conjunctions at all. For adding things together, you can usually get by with juxtaposition. Or you can use a case ending meaning with: in effect you say 'X and Y' by saying 'X with Y'. I'm not sure how disjunctions ('or') were handled-- today Quechua uses forms borrowed from Spanish.


Style

[back to outline]

A natural language has a wide variety of registers, or styles of speech: from the ceremonial or ritual, to the official or scientific, to the journalistic or novelistic, to ordinary conversation, to colloquial, to slang. Children talk in their own way; so do poets. The upper crust speaks differently from the lower classes.

Some of these registers work in predictable ways. For instance, rites are often conducted in an archaic form of the language (or sometimes another language entirely). Educated speech usually includes older, longer, foreign, or technical words. In Verdurian, for instance, educated speech borrows many words from the parent language, Cadhinor.

Slang often provides humorous substitutions for common words. Some such substitutions from Vulgar Latin have become the normal word in the Romance languages: testa 'pot' replaced caput 'head', giving French tête; bucca 'cheek' replaced os 'mouth', giving bouche; caballus 'nag' replaced equus 'horse', giving cheval.

Slang also borrows from minority groups: e.g. French toubib, chnouf, bled from Arabic; English shiv and pal from the Gypsies, schlock from Yiddish, jazz and jive from blacks; Spanish calato and cachaco from Quechua.


Politeness

[back to outline]

All cultures have ways of expressing politeness, but they differ in the methods used, and in what ways politeness is grammaticalized.

According to Anna Wierzbicka, polite speech in English lays great stress on respecting others and avoiding imposition. English has a vast array of indirect forms for asking people to do things, or even for offering them things: Will you have a drink? Would you like a drink? Sure you wouldn't like a beer? Why don't you pour yourself something? How about a beer? Aren't you thirsty? We're so used to such pseudo-questions that we use them rather than a direct imperative even when actual politeness is far from our minds: Will someone put this fucking idiot out of his misery? For Christ's sake, will you get lost?

In Polish, by contrast, a courteous host pushes his hospitality on the guest, dismissing the guest's expressed remonstrances and desires as irrelevant: Prosze bardzo! Jeszcze troszke! --Ale juz nie moge! --Ale koniecznie! "Please, a little more!" "But I can't!" "But you must!" And Polish is very free with imperatives-- indeed, to be really forceful you must use the infinitive instead.

Japanese is often even more indirect than English: e.g. it avoids the imperative "Drink Coca-Cola!" in favor of Koka kora o nomimashou! (lit. "We will drink Coca-Cola!").

Japanese is also notable for having verbal inflections which add a level of politeness (e.g. tetsudau 'helps'; polite form tetsudaimasu), as well as entirely different lexical items with the same purpose (e.g. iku 'go', humble form mairu, honorific irassharu).

Terms of address are a fertile field for exquisite complications; so are pronouns. In quite a few languages it's perceived as rather a familiarity to address someone using the second person pronoun: to be polite you use the plural (French vous), or a third-person form (Italian Lei, Spanish Usted from vuestra merced 'your mercy', Portuguese o senhor 'the gentleman'), or a title (Japanese sensei 'teacher', otousan 'father', etc.). If this seems odd, it's worth noting that English took the first approach, so thoroughly that the second person singular pronoun 'thou' disappeared.

Attempts have been made to formulate universals of politeness, but this can be tricky. E.g. it's been suggested that politeness involves avoiding disagreement; but in Jewish culture disagreement expresses sociability and is taken as bringing people closer together. Or, it's been said that direct praise of oneself is avoided, and praise of others is approved; but self-praise among Black American speakers is good form, and direct praise of others is avoided in Japanese.


Poetry

[back to outline]

For poetry you must consult your own Muse. However, it's worth pointing out that rhyme is not the only thing poetry can be based on:


Language families

[back to outline]

You can add enormous depth to a fantasy language by giving it a history, and relatives. Verdurian and its sister language Barakhinei, for instance, derive from Cadhinor, as French and Spanish derive from Latin. Cadhinor, Cuêzi, and Xurnásh, in turn, all derive from Proto-Eastern, and thus are related in systematic ways, much as Latin, Greek, and Sanskrit all derive from proto-Indo-European.

What can you do with such relationships?

Words often change meaning as they're borrowed. Some cute examples from Verdurian:


How do you do it?

[back to outline]

To do this well you have to know something about historical linguistics. The sci.lang faq will give a brief overview. Better yet, read Theodora Bynon's excellent Historical Linguistics, or Hans Henrich Hock's more thorough Principles of Historical Linguistics.

The basic principle is that sound change is almost completely regular. This is good news: it means all you have to do is devise a set of sound changes between the parent language and its derivative(s), and apply them to each word.

Here, for instance, are just some of the sound changes from Cadhinor to Verdurian:

A different set of sound changes can be used to create a sister language. For instance, Barakhinei changes unvoiced consonants to voiced between vowels (this is an extremely common change in languages), loses the final sound of each word, etc. The net result is a language related to but subtly different from Verdurian:
Cadhinor Verdurian Ismaîn Barakhinei gloss
prosan prosan prozn proza 'walk'
molenia mólnia moleni molenhi 'lightning'
ueronos örn rone feron 'eagle'
aestas esta este âshta 'summer'
laudan lädan luzn laoda 'go'
geleia zhelea jeleze gelech 'calm'

If you're interested in applying sound changes to one language in order to generate a descendent language, you may find my Sound Change Applier program useful.


Dialects

[back to outline]

You can use the same technique to create dialects for a your language. Linguistically, dialects are simply a set of language varieties which haven't diverged far enough apart that their speakers can't understand each other. Dialects can be created simply by specifying a smaller number of less dramatic sound changes.

For instance, the Verdurian dialect of Avéle is characterized by the following changes:

Dialects can also have their own lexical terms, of course, perhaps borrowed from neighbors or previous inhabitants of the local territory.

People often suppose that the dialect of the capital city (or whatever other place has supplied the standard language) is more 'pure' or more conservative than provincial speech. In fact the opposite is likely to be true: the active center of a culture will see its speech change fastest; rural or isolated areas are more likely to preserve older forms.

If you're inventing an interlanguage you may of course want to do everything possible to prevent the rise of dialects. This is probably an expression of the fascistic streak common to language tinkerers. Why not design your interlanguage with dialects, reflecting the phonology of various linguistic regions? The resulting language, with varieties close to the major natural languages, might achieve more acceptance than uniform interlanguages have.


The Language Construction Kit is © 1996 by Mark Rosenfelder.
[back to outline] [Back to Metaverse]