The Language Construction Kit

Grammar

Once you’ve bundled together some words and perhaps an alphabet, you may think you’re done. If you do, it’s likely that you’ve just created an elaborate cipher for English. You still have the grammar to do, bucko.

To linguists, a grammar is a full description of a language, including:

Phonology, the sounds of the language, which we’ve already covered
Morphology, how words are formed, whether by inflections, compounding, or more exotic ways
Syntax, which is about how words are arrayed into sentences, so it includes word order and constructions that depend on separate words or particles
Semantics, the study of meaning, including how it changes over time and how words relate to each other
Pragmatics, how language is actually used in the world, and how meanings change in context

We’ll start with morphology, but after that I’m going to simply describe a bunch of features that you might want to put in your language and suggest some alternatives. A given feature may be implemented by morphology or by syntax— that’s one of the choices you’ll be making.

I’ll touch only briefly on semantics and pragmatics, but we’ll talk about where to find more info.

Is your language fusional, agglutinative, or isolating?

Inflections are affixes used to conjugate verbs and decline nouns. Examples from English are the -s we add to verbs for the 3rd person present form, the -s added to pluralize nouns, and the -ed of the past tense. Languages such as Russian or Latin have complex, not to say baroque, inflectional systems.

In agglutinative languages, each affix has a single meaning. For instance, Quechua wasikunapi ’in the houses’; the plural suffix -kuna is separate from the case suffix -pi. Or mikurani ‘I ate’, in which the past tense suffix -ra- is kept separate from the personal ending -ni.

By contrast, in fusional languages, a single inflection may encode multiple meanings. For instance, in the Russian домов domóv, the -óv ending indicates both plurality and the genitive case; it doesn’t bear any evident relationship with other plural endings (e.g. nominative -á) or the singular genitive ending (-a). In Spanish comí ‘I ate’, the -í ending indicates the 1st person singular, past tense, indicative mood— quite a job for one vowel, even accented.

In isolating languages, there are no suffixes at all; meanings are modified by inserting additional words. In Chinese, for instance, wǒ chī fàn could mean ‘I eat’ or ‘I was eating’, depending on the context; the verb is not inflected at all. For precision, adverbs or particles can be brought in: wǒ chī fàn zuótiān ‘I was eating yesterday’, wǒ chī fàn le ‘I’ve eaten (i.e. I ate and finished)’.

Polysynthetic language incorporate nouns or other roots within the verb. For instance, Nishnaabemwin naajmiijme ‘fetch food’ incorporates miijim ‘food’. The incorporated form may differ from the noun normally used as a standalone word.

In practice natural languages are all a bit mixed; some inflections in fusional languages have a single meaning; Quechua does have a few fused inflections, and Mandarin does have a few suffixes.

Conlang creators seem to gravitate toward agglutinative or isolating languages; but there’s something to be said for fusional inflections. They tend to be compact, for instance. You can’t beat -í for succintness.

How do you form inflections?

The inflections of the Indo-European languages lean heavily toward suffixes: cf. Spanish Las mujeres jóvenes bailan ‘The young women dance’.
The Bantu languages prefer prefixes: cf. Swahili Kisu kimoja kilitosha ‘One knife was enough’.
Infixes are inserted within a root. My conlang Kebreni has the infix -su- for ‘made of X’: siva ‘sand’ → sisuva ‘sandy’.
Vowel change is extensively used in the Semitic language for both inflectional and derivational morphology. E.g. Arabic KTB ‘write’ has such forms as yaktubu ‘he writes’, kitba ‘writing’, kitāb ‘book’, and kātib ‘writer’. In Munkhâshi I made use of consonant changes in verbal paradigms; e.g. the B/D/E rank forms of ‘be’ are khath, khat, gat.
Reduplication repeats all or part of the root. Sanksrit formed its perfect this way; e.g. tan- ‘stretch’ had the perfect form tatan-.

How do you form fused inflections? The simplest way is to derive them from an earlier, worn-down set of agglutinative inflections. But there are other paths (such as confusion between different sets of paradigms), so you can also just invent them.

In the following sections, be aware that the possible approaches may include inflections, separate particles, word order, and more. So (say) the negative may belong to the morphology in one language, to syntax in another.

Do you have nouns, verbs, and adjectives?

Why not get rid of one or two of them?

It’s not hard to get rid of adjectives. One easy way is to treat them as verbs: instead of saying "The wall is red", you say "The wall reds"; likewise, instead of "the red wall" you say "the redding wall".

With such tricks you can even get rid of the verb be, which according to some theorists is responsible for most of the sloppy thinking in the world today. (Heinlein was careful to ban ‘to be’ from Speedtalk.) About the only response this notion deserves is: would that clear thinking was that easy.

You can extend the idea to get rid of nouns. For instance, in Lakhota, ethnic names are verbs, not nouns. There’s a verb ‘to be a Lakhota’: the present forms mean ‘I am a Lakhota, you are a Lakhota, etc.’

You can have some fun with this. "The rock is under the tree" could be expressed as something like "There is stonying below the growing, greening, flourishing", or perhaps "It stones whileunder it grows greeningly." If we really encountered a language like this, however, I’d have to wonder whether we weren’t just fooling ourselves. If there’s a word that refers to stones, why translate it as ‘to stone’ rather than simply ‘stone’?

Jorge Luis Borges, in "Tlön, Uqbar, Tertius Orbis", posits a language without nouns; but this was because its speakers were Berkeleyan idealists, who didn’t believe in object permanence. However, linguists really do not like using semantic classes— or metaphysics— to define syntactic categories. (It’s not the right level of analysis; and it tends to obscure how languages really work by making them all look like Latin.)

Jack Vance (in The Languages of Pao) posited a language without verbs. For instance, "There are two matters I wish to discuss with you" comes out something like "Statement-of-importance — in-a-state-of-readiness— two; ear— of [place name]— in-a-state-of-readiness; mouth— of this person here— in-a-state-of-volition." Vance may be in a state of pulling our legs.

Can you make a case?

What’s case? It’s a way of marking nouns by function: e.g. Latin

mundus subject or nominative the world (is, does, ...)

mundum object or accusative (something affects) the world

munde vocative O world!

mundi possessive or genitive the world’s

mundo indirect object or dative (given, sold, etc.) to the world

mundo ablative (something is done) by the world

Our possessives (’world’s’) started out as genitive case forms, though they’re really particles today. Most of our pronouns still have nominatives and accusatives (I vs. me, we vs. us).

Conlang enthusiasts generally either love case (because it makes a language compact and frees up word order) or hate it (because English doesn’t do much with it).

Not all case systems work the same way. Consider these roles:

A. subject of transitive sentences: I broke the window
B. object of transitive sentences: I broke the window
C. subject of intransitive sentences: the window broke

English and Latin treat A and C alike, using the nominative, B as the accusative. But some languages, such as Basque, group B and C together as the absolutive case, leaving A in the ergative case. (In a way it’s more logical... after all, the window always has the same semantic role, so in ergative/absolutive languages it has always the same case.)

If you think that’s weird, a few languages, such as Dyirbal, use the nominative/accusative system for 1st and 2nd person pronouns (I, we, you), and the ergative/absolutive system for nouns and for 3rd person pronouns.

You can have case without inflections, by using particles— e.g. Japanese o marks the accusative, no the genitive.

If a language doesn’t have case it may rely on word order to indicate the relationship between a verb’s arguments; but there is another alternative: head-marking on the verb. For instance, in the Swahili Kitabu umekileta? ‘Did you bring the book?’, the verb leta has prefixes indicating the subject (u- ‘you’) and the object (-ki-, a third person prefix agreeing in gender with kitabu). (-me marks the perfect tense.) The gender-specific object marker on the verb allows free word order even without case marking on the nouns.

Do nouns have gender?

Gender need not be simply masculine/feminine. Swahili, for instance, has eight gender classes, none of them masculine/feminine: one is for animals, one for human beings, one for abstract nouns, one forms diminutives, etc. Algonquian languages have animate/inanimate genders instead. For a conlang I created physical/spiritual genders.

Conlangers used to avoid gender, back when they were mostly creating auxlangs. But it’s a nice addition to a naturalistic language; Verdurian has masculine and feminine gender.

People ask, what is gender for? Gender is remarkably persistent: it’s persisted in the Indo-European, Semitic, and Bantu language families for at least five thousand years. It must be doing something useful.

A few possibilities:

In a gendered language like Spanish, adjectives agree in number and gender with nouns: los toros poderosos ‘the powerful bulls’. This helps tie adjectives and nouns together, reducing the functional load on word order and adding useful clues for parsing.
It gives language (in John Lawler’s terms) another dimension to seep into. In French, for instance, there are many words that vary only in gender: port/porte, fil/file, grain/graine, point/pointe, sort/sorte, etc. Changing gender must have once been an easy way to create a subtle variation on a word.
It allows indefinite references to give someone’s sex.
It offers some of the advantages of obviative pronouns (see below): one may have two or more third person pronouns at work at the same time, referring to different things.
It can support free word order without case marking, as in the Swahili example above.

What else is marked on the noun?

The noun can have other markings too, such as:

Plurality, as in English. Some languages have dual forms for pairs of things.
Honorifics, as in Japanese o-.
Topic, like Quechua -qa, or the Swedish postposed article (flickan ‘the girl’).
Possession: e.g. Quechua wasi ‘house’ → wasiyki ‘your house’.
Diminutives and augmentatives are very useful.

Does the verb inflect by person and number?

Like case, personal endings make for nice compact sentences, since if you have them you can generally omit subject pronouns. Here’s an example from Spanish; note that English has a remnant of person/number agreement with the -s ending.

hablo I speak

hablas you (s.) speak

habla he/she speaks

hablamos we speak

habláis you (pl.) speak

hablan they speak

Some languages, such as Swahili and Quechua, include the object pronoun in the verb as well, usually as an infix. Quechua rimasunki means ‘he is speaking to you (s.)’.

The Romance languages have clitic forms of the pronouns, which stop just short of being verb inflections: e.g. French Je le vois, ‘I see him’; Spanish Dígame, ‘Tell me’.

Basque verbs can inflect to encode information about the listener. For instance, ekarri digute is a neutral way of saying ‘They brought it to us’; ekarri zigunate means the same, but also indicates that the listener is a woman addressed with the informal personal pronoun.

What else can you put on the verb?

Some distinctions languages make on their verbs:

time, of course (tense strictly speaking)
whether the action is completed (grammarians say perfect) or not
whether the focus is on the ongoing process (progressive), or a single action, or a habitual action, or a repeated action (all these are aspects)
whether the action can be counted on (indicative mood), or is doubtful or merely to be desired (subjunctive), or isn’t happening at all (negative)
whether I’m telling you (indicative again) or ordering you (imperative)
whether the speaker knows about the action from personal experience, or merely from hearsay, or merely considers it probable (evidentiality)
whether the verb is intransitive (it just happens) or transitive (it happens to something) or reflexive (it happens to the subject)
whether the verb simply describes a state (static) or reports a change in state (dynamic). In my conlang Caďinor, for instance, scadran means ‘ride’ in its static forms, ‘mount’ in its dynamic forms; ciloran is static ‘need, lack’ and dynamic ‘run out of’.
degree of deference between speaker and listener
who benefits from an action (a benefactive)
the speaker’s emotional reaction (e.g. Quechua -lla which expresses fear or lamentation, or -ru- for urgency)

Any language can express these distinctions, but they differ in which features are grammaticalized: reflected in the morphology and syntax of the language. English, for instance, grammaticalizes person and number in its verbal system, while Japanese does not. On the other hand Japanese verbs have positive and negative forms, as well as a morphological indication of levels of deference.

Languages also differ in how many distinctions are made in these categories.

There is an Austronesian language which has four past tenses (last night, yesterday, near past, remote past) and three futures (immediate, near, remote).
The languages of the Vaupés river basin distinguish five levels of evidentiality: visual perception; non-visual perception; deduction from obvious clues; hearsay; and mere assumption.

What are the personal pronouns?

The basic, universal persons are first (referring to the speaker), second (the hearer), and third (everybody else), and usually there are separate singular and plural forms. Turkish neatly fits this six-cell grid:

singular plural

1st person ben biz

2nd person sen siz

3rd person o onlar

However, there’s lots of room to play around. Distinctions may be made:

by gender (not necessarily just in the third person— cf. Arabic ʔanti ‘you (s. f.)’)
not by gender (many languages don’t distinguish ‘he’ and ‘she’)
by number (I vs. we... sometimes there’s special dual forms for pairs of things; also note that many language form the plurals with a regular suffix: Mandarin wǒ ‘I’ → wǒmen ‘we’)
not by number (it’s an optional distinction in Chinese)
by animacy (cf. he/she vs. it)
whether ‘we’ includes ‘you’ (inclusive we) or not (exclusive we)
by level of formality or politeness
by whether third persons are present or not
between two sets of third persons (proximate and obviative)— imagine having two forms of ‘he’ to distinguish two different persons
between real and hypothetical reference: e.g. English ‘one’, French on

It’s possible to bag the third person by using demonstratives instead (this one, that one). Many cultures seem to feel that raw pronouns are a little impolite, and use titles instead. Miss Manners informs us that the Holy Roman Emperor properly referred to himself as ma majesté.

I invented an alien race once that used different pronouns on land and underwater (they were amphibians), and had the inclusive/exclusive and proximate/obviative distinctions. They also had a pronoun for group minds, and pronouns for each of their three sexes. The complete list was impressive.

What are the other pronouns?

Esperanto has a table of correlatives, a nice way to organize all the non-personal pronouns. For English, it looks like this:

query this that some no every

adjective which this that some no every

person who this that someone no one everyone

thing what this that something nothing everything

place where here there somewhere nowhere everywhere

time when now then sometime never always

way how thus somehow

reason why

The first column comprises interrogative pronouns; the second two are demonstratives, and the rest are indefinite pronouns. The adjectives no, some, most, every are quantifiers.

It’s easy and diverting to regularize the table, although natural languages generally leave holes, which must be filled in with phrases (’in that way’, ‘for no reason’).

In some languages, like Russian, the interrogative pronouns (’Who did it?’) and the relative pronouns (’the man who did it’) are different.

Generally, if nouns decline, these pronouns decline the same way. Sometimes they’re worse— English, for instance, retained separate ‘from’ and ‘to’ forms for pronouns of place (hence = from here / hither = to here) long after such distinctions were lost for ordinary nouns.

What are the numbers?

Are the numbers based on tens, or something else? Many human number systems are based on fives or twenties instead. My pronoun-happy aliens had a duodecimal system. Intelligent machines would surely prefer hexadecimal...

How do you form higher numbers? ‘Forty-three’, for instance, may be formed in several ways:

forty three
four three
forty with three
three and forty
four tens and three
eight fives and three
fifty less seven
twice twenty and three

Where nouns decline, numbers may also. Or they may not. In Latin, you stop declining the numbers at four.

In Indo-European languages we are used to unanalyzable roots for the numbers; but in other families number names are derivations, often related to the process of counting on fingers and toes— e.g. Choctaw 5 = tahlapi ‘the first (hand) finished’; Klamath 8 ndan-ksahpta ‘three I have bent over’; Unalit 11 atkahakhtok ‘it goes down (to the feet)’; Shasta 20 tsec ‘man’ (considered as having 20 countable appendages).

For more on numbers, see the Sources page of my Numbers from 1 to 10 in Over 2000 Languages page.

What about adjectives?

Adjectives can be something like nouns, something like verbs, or like neither. If they’re like nouns, they generally agree with their head noun in gender, case, and number. If they’re like verbs, they conjugate like verbs.

How are comparative expressions ("holier than thou", "most holy", "as holy as thou") formed?

It’s useful to have some regular derivations for or from adjectives:

opposite (un-)
lack (-less) or surfeit (-ful)
possibility (-able)
liking (-phile) or disliking (-phobe)
relating to a place or language (-er, -ian, -an, -ese)
weakening of meaning (-ish)
strengthening of meaning (to the max)
adverb (-ly)

Are there articles?

English nouns feel a little naked without an article— definite ‘the’ or indefinite ‘a(n)’. In the plural we leave the indefinite article out (’dogs’), but in Romance language the indefinite article can be pluralized (unos perros).

Many languages, such as Latin and Russian, get by quite happily without them.

It may help to understand what the distinction really means. Ordinarily it’s pragmatic: the can be paraphrased ‘You know which one I’m talking about’. Consider:

I saw a man at the rodeo. The man had on a horrid plaid suit.

A man in the first sentence signals that this character is being introduced in this conversation; the in the second sentence signals that he’s old news, he is in fact the same guy we just started talking about. The before rodeo also indicates that the speaker expects that the hearer can figure out which rodeo— if not, he’d have said a rodeo.

Word order serves the same function in Russian. There you’d say, in effect,

I saw man in rodeo. Man wore horrid plaid suit.

When he’s introduced, the man lives near the end of the sentence; when he’s old news, he appears at the front.

(Actually, they don’t have many rodeos in Russia.)

What order do the components of a noun phrase appear in?

Consider articles, numbers, quantifiers, adverbs, adjectives, possessives, subordinate clauses— e.g.

The ten very happy robots who passed the bar exam

You can generally divide phrases into heads and modifiers. Some languages are very consistent about placing all modifiers before, or all after the head. English is head-final, with the exception of subordinate clauses. Japanese is head-final too, but it’s more consistent: it would say "bar-exam passed ten robots".

What order do the components of a sentence appear in?

Linguists like to talk about the order of subject, object, and verb, which of course can occur in just six combinations: SVO (as in English or Swahili), SOV (Latin, Quechua, Turkish), VSO (Welsh), OVS (Hixkaryana), OSV (Apurinã), VOS (Malagasy). The last three are for some reason rare, although they do exist.

Combinations and complications are common; for instance, simple German sentences are SVO, but subordinate clauses are SOV:

Wer seine Finanzen im Griff hat, ist einfach entspannter.
Whoever has his finances in order is simply more relaxed.

But if there’s an auxiliary, it appears right after the subject, while the participle or infinitive moves to the end:

Mein Vater ist vor einigen Tagen nach London gefahren.
My father traveled to London several days ago.

(It’s really more complicated than that, but that’s the basics!)

"Subject" and "object" may work differently in languages with ergativity or topicalization.

In Flaidish, a topic can be expressed that isn’t a grammatical constituent of the sentence:

Luckit teeren Verduria zys kematt nellit.
Among human cities, Verduria is pretty nice.

How do you form yes-no questions?

English has a rather baroque procedure (inverting subject and verb). Other languages simply make use of a rise in intonation, or add a particle at the beginning of the sentence (e.g. Polish czy) or to the verb.

Many languages offer ways of suggesting the answer to the question. For instance, the Latin particle num expects the answer ‘no’ (Num ursi cerevisiam imperant? Bears don’t order beer, do they?), while nōnne expects ‘yes’ (Nōnne ursus animal implūme bipēs? Bears are featherless bipeds, aren’t they?).

Where questions are formed by appending a particle (e.g. -ne in Latin, or -chu in Quechua), the particle can be added directly to the word being questioned. We can only achieve the same effect in English by emphasis (Is the bear drinking beer? Is the bear drinking beer?) or by rearrangement (Is it beer that the bear is drinking?).

One way of asking a quesion in Chinese is to offer the listener a choice: Nǐ shì bu shì Běijīng rén? "You’re from Beijing?", literally "You be, not be from Beijing?"

Some folks, believe it or not, get by without having words for ‘yes’ or ‘no’. The usual workaround is repeat the verb from the question: "Do you know the way to San José?" can be answered "I know" or "I don’t know", as in Portuguese:

—Você conhece o caminho que vai a São José?
—Conheço. [’I know’]

How about other questions?

English usually moves the question word to the beginning of the sentence, but other languages don’t, asking in effect “You said what?” or “She’s going out with whose boyfriend?”

Also note that some languages have different pronouns for relative clauses (“The man who fishes”) and questions (“Who is this man?”).

How do you negate a sentence?

Again, there are many options:

add a particle before the verb (as in Russian or Spanish)
...or after the verb (as we used to do: thou rememberest not?),
...or both (French je ne sais pas)
use a special mood of the verb (Japanese nageru ‘throw’, nagenai ‘not throw’)
add a particle at the beginning or end of the sentence (e.g. Quechua mana, which however also requires a supporting suffix on the verb)
insert a special verb and negating that, as English does
use a special inflected auxiliary (e.g. Finnish e-)— it’s as if ‘not’ was an inflected verb: I not, you not, he nots...

These can be mixed, as in English: auxiliaries are directly negated with -n’t, while other verbs require do-support: inserting ‘do’ and negating that.

How do conjunctions work?

Conjunctions allow constituents to be paired, and express various relationships between them— e.g. English and, or, but, then. (But has the same meaning of and but expresses contrast or surprise.)

Latin has a neat trick: to express X and Y, you can say X Y-que, using a clitic. The expression SPQR, Senātus Populusque Rōmānus, is an example of this construction: the Senate and the People of Rome.

Latin also distinguishes inclusive and exclusive or: vel X vel Y means that you can have X or Y or both, but aut X aut Y means you get one or the other but not both.

Quechua (before the Spanish conquest) got by without conjunctions at all. For adding things together, you can usually get by with juxtaposition. Or you can use a case ending meaning with: in effect you say ‘X and Y’ by saying ‘X with Y’. I’m not sure how disjunctions (’or’) were handled— today Quechua uses forms borrowed from Spanish.

How do you form subclauses?

Subclauses are perhaps the most sophisitcated aspect of syntax, allowing entire sentences to serve as constituents or modifiers. A few basic types:

Sentential arguments, where a verb takes an entire sentences as its subject (“That Grandma’s drunk suprises me”) or object (“He believes that you’re crazy”).
Special subordinators may form place and time adverbials: “when/where you were born”
A preposition can take a sentence as its object: “after you were born”
A sentence can modify a noun, forming a relative clause: “the man who ate a horse”

Quechua has an interesting way of forming relative clauses, using participles. For instance:

Chakra-y yapu-q runa-ta qaya-mu-saq
field-my plow-participle man-accusative call-movement.toward-I.future
I’ll call the man that plowed my field.

Rather than looking like an ordinary sentence (“the man plowed my field”), the subclause has the form of a participle (“the my-field-plowing man”).

Mandarin can subordinate any clause (and indeed many other things) with the particle de:

Wǒmen gěi tā shōuyīnjī le.
We gave him a radio.
→ wǒmen gěi tā de shōuyīnjī
the radio we gave him

If your language has cases, you must be careful to put the pronouns in the right case— English doesn’t give you the right instincts here, now that whom is used only by pedants. In Latin Quod fēcit sapiō “I know what he did”, quod ‘what’ is in the accusative, as it’s what was done, while in Virum quī fēcit sapiō “I know the man who did it”, quī ‘who’ is in the nominative.

Transformations

It can be useful to think about relative clauses using transformations. For instance, a sentence like

The man that John hit yesterday prefers beer to wine.

can be seen as deriving by transformation from one sentence that’s embedded in another:

The man [John hit him yesterday] prefers beer to wine.

In English, you can think of relativization as proceeding in two steps:

replacing the pronoun in the subclause with an interrogative pronoun (or that)
The man [John hit whom yesterday] prefers beer to wine.
moving that pronoun to the head of the clause
The man [whom John hit yesterday] prefers beer to wine.

Your language may also put limits on what exactly can be relativized. The following examples are legal in English, for instance, but not in certain other languages.

the girl [you think [I love her]]
→ the girl you think I love
the neighbor [I traumatized his pastor]
→ the neighbor whose pastor I traumatized
the cat [I said [Alesia brought it home]]
→ the cat that I said Alesia brought home

Not everything is possible in English:

This is the man [my girlfriend’s father is a friend of John and him]
→ This is the man that my girlfriend’s father is a friend of John and.

or (thanks to Leo Connolly for this example)

There’s the barn [more people have gotten drunk down in back of it than any other barn in the county]
→ There’s the barn that more people have gotten drunk down in back of than any other barn in the county.

Some languages can handle such sentences simply by leaving the pronoun in the subclause. S.J. Perelman liked to do this in English:

“That’s the man which my wife is sleeping with him!”

Some other constructions that can be thought of as transformations:

Passives: John ran the band → the band was run by John
Fronting: John ran the band → The band, John runs it
Clefting: John ran the band → It’s John that runs the band
Causatives: John made [the band played Van Halen] → John made the band play Van Halen
Raising: It’s easy [John runs the band] → It’s easy for John to run the band
Nominalization: John ran the band → John’s running of the band

My conlang Axunašin has a very extensive section on transformations.

Style

A natural language has a wide variety of registers, or styles of speech: from the ceremonial or ritual, to the official or scientific, to the journalistic or novelistic, to ordinary conversation, to colloquial, to slang. Children talk in their own way; so do poets. The upper crust speaks differently from the lower classes.

Some of these registers work in predictable ways. For instance, rites are often conducted in an archaic form of the language (or sometimes another language entirely). Educated speech usually includes older, longer, foreign, or technical words. In Verdurian, for instance, educated speech borrows many words from the parent language, Caďinor.

Slang often provides humorous substitutions for common words. Some such substitutions from Vulgar Latin have become the normal word in the Romance languages: testa ’pot’ replaced caput ’head’, giving French tête; bucca ’cheek’ replaced os ‘mouth’, giving bouche; caballus ‘nag’ replaced equus ‘horse’, giving cheval.

Slang also borrows from minority groups: e.g. French toubib, chnouf, bled from Arabic; English shiv and pal from Romani, schlock from Yiddish, jazz and jive from Black slang; Spanish calato and cachaco from Quechua.

Politeness

All cultures have ways of expressing politeness, but they differ in the methods used, and in what ways politeness is grammaticalized.

According to Anna Wierzbicka, polite speech in English lays great stress on respecting others and avoiding imposition. English has a vast array of indirect forms for asking people to do things, or even for offering them things: Will you have a drink? Would you like a drink? Sure you wouldn’t like a beer? Why don’t you pour yourself something? How about a beer? Aren’t you thirsty? We’re so used to such pseudo-questions that we use them rather than a direct imperative even when actual politeness is far from our minds: Will someone put this fucking idiot out of his misery? For Christ’s sake, will you get lost?

In Polish, by contrast, a courteous host pushes his hospitality on the guest, dismissing the guest’s expressed remonstrances and desires as irrelevant: Prosze bardzo! Jeszcze troszke! —Ale juz nie moge! —Ale koniecznie! "Please, a little more!" "But I can’t!" "But you must!" And Polish is very free with imperatives— indeed, to be really forceful you must use the infinitive instead.

Japanese is often even more indirect than English: e.g. it avoids the imperative "Drink Coca-Cola!" in favor of Koka kora o nomimashou! (lit. "We will drink Coca-Cola!").

Japanese is also notable for having verbal inflections which add a level of politeness (e.g. tetsudau ‘helps’; polite form tetsudaimasu), as well as entirely different lexical items with the same purpose (e.g. iku ‘go’, humble form mairu, honorific irassharu).

Terms of address are a fertile field for exquisite complications; so are pronouns. In quite a few languages it’s perceived as rather a familiarity to address someone using the second person pronoun: to be polite you use the plural (French vous), or a third-person form (Italian Lei, Spanish Usted from vuestra merced ‘your mercy’, Portuguese o senhor ‘the gentleman’), or a title (Japanese sensei ‘teacher’, otōsan ‘father’, etc.). If this seems odd, it’s worth noting that English took the first approach, so thoroughly that the second person singular pronoun ‘thou’ disappeared.

Attempts have been made to formulate universals of politeness, but this can be tricky. E.g. it’s been suggested that politeness involves avoiding disagreement; but in Jewish culture disagreement expresses sociability and is taken as bringing people closer together. Or, it’s been said that direct praise of oneself is avoided, and praise of others is approved; but self-praise among Black American speakers is good form, and direct praise of others is avoided in Japanese.

Poetry

For poetry you must consult your own Muse. However, it’s worth pointing out that rhyme is not the only thing poetry can be based on:

Old English verse was based on alliteration.
Latin and Greek poetry was based on quantity, that is, patterns of long and short vowels.
Blank verse, of course, is based on patterns of stress, without having to rhyme.
French verse is generally based on lines of a certain syllable length, e.g. the alexandrine, of twelve syllables. Similarly, the haiku is composed of three lines, of 5, 7, and 5 syllables each.
Ancient Hebrew poetry was based on parallelism, the near repetition of an idea ("But let justice roll down like waters, and righteousness like an ever-flowing stream."), or on successive sentences or verses each beginning with a different letter (notably Psalm 119).

It’s also worth thinking about the goals of the poet. Is he aiming at grandeur? Historical allusion? Wit? Startlingness?

Is poetry a popular art, like rap? If so, it probably stays fairly close to colloquial speech. If it’s a rarefied exercise, it may either maintain archaic forms or experiment with the language.

Finally, think about what foreign cultures influenced your culture’s poetry. Latin borrowed many Greek meters; and European poetry has been deeply influenced by Latin.

Semantics and Pragmatics

Some of the most interesting bits of linguistics fall under semantics (which covers meaning) and pragmatics (which covers how languages are used in the real world, in context).

We’ve touched on these above, but for a more in-depth introduction, see my grammar of Xurnese.

Language families

You can add enormous depth to a fantasy language by giving it a history, and relatives. Verdurian and its sister languages Barakhinei, Ismaîn, and Sarroc all derive from Caďinor, as French and Spanish derive from Latin. Caďinor, Cuêzi, and Xurnese, in turn, all derive from Proto-Eastern, and thus are related in systematic ways, much as Latin, Greek, and Sanskrit all derive from proto-Indo-European.

What can you do with such relationships?

Create doublets of words to enrich the language: one that derives from the ancient language and is worn down by milennia of sound change, one that has been borrowed more recently in its ancient form. Verdurian has doublets such as these:
fežir ‘hurl’ / pegeio ‘force’
sönil ‘saddle’ /asuena ‘seat’
žanec ‘coming’ / ctanec ‘future tense’
elut ‘fair play’ / aelutre ‘virtuous’
Create learned borrowings. Legal, scientific, medical, literary, and theological terms in Verdurian are often reborrowed from Caďinor: e.g. vocet ‘summons’; gutia ‘epilepsy’ (from a Caďinor word meaning ‘shaking’), menca ‘style, school’.
Verdurian has also borrowed educated terms from Cuêzi: avisar ‘school’, deyon ‘matter’, risunen ‘draw’. Moreover, some terms were borrowed direct from Cuêzi; others were borrowed from Cuêzi into Caďinor in ancient times, and then inherited in Verdurian: e.g. risunen ← risunden ← Cuêzi risonda ‘drawing’, ultimately from risi ‘reed pen’.
Set up borrowings from related languages, e.g. Verdurian kenek ‘camel’, borrowed from Barakhinei kêntek, derived from Caďinor kentos ‘plain’, which has also come down into Verdurian as kent. Čište ‘guitar’ was borrowed from Ismaîn, and is cognate with native sista ‘box’, both going back to Caďinor cista ‘box’.

Words often change meaning as they’re borrowed. Some cute examples from Verdurian:

čayma ‘tent’ ← Western chaimba ‘shelter’— because the shelters of the Western barbarians were in fact tents
dalu ‘king’ ← C. dalu ‘prince’— because when the Caďinorian empire fell, its princes each became independent rulers
garlo ‘sorcerer’ ← C. garorion ‘wise or clever man’; note the dissimilation of the two r’s; compare Latin arbor → Spanish arbol
kestora ‘natural philosophy’ ← C. kestora ‘the categories (of study)’
minyón ‘cute’ ← C. mingondul ‘beggar’ ← mingonda ‘large mat’, i.e. all that a beggar possessed
nočula ‘together’ ← C. nodatula ‘tied up’
ponyore ‘baritone’ ← Cuêzi pomioro ‘manly’

How do you do it?

To do this well you have to know something about historical linguistics. The sci.lang faq will give a brief overview. Better yet, read Theodora Bynon’s excellent Historical Linguistics, or R.L. Trask’s book of the same name, or Hans Henrich Hock’s more thorough Principles of Historical Linguistics.

The basic principle is that sound change is almost completely regular. This is good news: it means all you have to do is devise a set of sound changes between the parent language and its derivative(s), and apply them to each word.

Here, for instance, are just some of the sound changes from Caďinor to Verdurian; you can see the full set here.

loss of final -os: corsos → cos
p fricativizes to f before s or t: psis → fsiy
c becomes s before a front vowel, or before n: cisir → sisir; aracnis → arasni
g becomes ž before a front vowel: gina → žina
l becomes y between vowels: bileta → biyeta
nd, dr, lg, kr simplify to n, d, ly, ř respectively: sudrir → sudir, unge → unye
diphthongs normally simplify: aiďos → aď, caer → cer, Endauron → Enäron

A different set of sound changes can be used to create a sister language. For instance, Barakhinei changes unvoiced consonants to voiced between vowels (this is an extremely common change in languages), loses the final sound of each word, etc. The net result is a language related to but subtly different from Verdurian:

gloss Caďinor Verdurian Ismaîn Barakhinei Sarroc

walk prosan prosan prozn proza

lightning molenia molnia moleni molenhi mlenoya

eagle ueronos örn ŕone feron wieron

summer aestas esta eşte âshta aisťa

go laudan lädan luʐn laoda lawda

calm geleia želea jeleʐe gelech glieȟa

If you’re interested in applying sound changes to one language in order to generate a descendent language, you may find my Sound Change Applier program useful.

Dialects

You can use the same technique to create dialects for a your language. Linguistically, dialects are simply a set of language varieties which haven’t diverged far enough apart that their speakers can’t understand each other. Dialects can be created simply by specifying a smaller number of less dramatic sound changes.

For instance, the Verdurian dialect of Avéle is characterized by the following changes:

Unstressed vowels are reduced to i (front vowels), schwa (back vowels), or vocalic r (before r)
Consonants between vowels become voiced: standard epese ‘thick’ becomes ebeze
Where Caďinor c changes to s in standard Verdurian, in Avéle it changes to š
Where Caďinor ct changes to ž in standard Verdurian, in Avéle it also changes to š

Dialects can also have their own lexical terms, of course, perhaps borrowed from neighbors or previous inhabitants of the local territory.

People often suppose that the dialect of the capital city (or whatever other place has supplied the standard language) is more ‘pure’ or more conservative than provincial speech. In fact the opposite is likely to be true: the active center of a culture will see its speech change fastest; rural or isolated areas are more likely to preserve older forms.

If you’re inventing an auxlang you may of course want to do everything possible to prevent the rise of dialects. This is probably an expression of the fascistic streak common to language tinkerers. Why not design your interlanguage with dialects, reflecting the phonology of various linguistic regions? The resulting language, with varieties close to the major natural languages, might achieve more acceptance than uniform interlanguages have.

Back to Outline

Back to Sounds

On to Writing