Once you’ve bundled together some words and perhaps an alphabet, you may think you’re done. If you do, it’s likely that you’ve just created an elaborate cipher for English. You still have the grammar to do, bucko.
To linguists, a grammar is a full description of a language, including:
I’ll touch only briefly on semantics and pragmatics, but we’ll talk about where to find more info.
Inflections are affixes used to conjugate verbs and decline nouns. Examples from English are the -s we add to verbs for the 3rd person present form, the -s added to pluralize nouns, and the -ed of the past tense. Languages such as Russian or Latin have complex, not to say baroque, inflectional systems.
In agglutinative languages, each affix has a single meaning. For instance, Quechua wasikunapi ’in the houses’; the plural suffix -kuna is separate from the case suffix -pi. Or mikurani ‘I ate’, in which the past tense suffix -ra- is kept separate from the personal ending -ni.
By contrast, in fusional languages, a single inflection may encode multiple meanings. For instance, in the Russian домов domóv, the -óv ending indicates both plurality and the genitive case; it doesn’t bear any evident relationship with other plural endings (e.g. nominative -á) or the singular genitive ending (-a). In Spanish comí ‘I ate’, the -í ending indicates the 1st person singular, past tense, indicative mood— quite a job for one vowel, even accented.
In isolating languages, there are no suffixes at all; meanings are modified by inserting additional words. In Chinese, for instance, wǒ chī fàn could mean ‘I eat’ or ‘I was eating’, depending on the context; the verb is not inflected at all. For precision, adverbs or particles can be brought in: wǒ chī fàn zuótiān ‘I was eating yesterday’, wǒ chī fàn le ‘I’ve eaten (i.e. I ate and finished)’.
Polysynthetic language incorporate nouns or other roots within the verb. For instance, Nishnaabemwin naajmiijme ‘fetch food’ incorporates miijim ‘food’. The incorporated form may differ from the noun normally used as a standalone word.
In practice natural languages are all a bit mixed; some inflections in fusional languages have a single meaning; Quechua does have a few fused inflections, and Mandarin does have a few suffixes.
Conlang creators seem to gravitate toward agglutinative or isolating languages; but there’s something to be said for fusional inflections. They tend to be compact, for instance. You can’t beat -í for succintness.
In the following sections, be aware that the possible approaches may include inflections, separate particles, word order, and more. So (say) the negative may belong to the morphology in one language, to syntax in another.
Why not get rid of one or two of them?
It’s not hard to get rid of adjectives. One easy way is to treat them as verbs: instead of saying "The wall is red", you say "The wall reds"; likewise, instead of "the red wall" you say "the redding wall".
With such tricks you can even get rid of the verb be, which according to some theorists is responsible for most of the sloppy thinking in the world today. (Heinlein was careful to ban ‘to be’ from Speedtalk.) About the only response this notion deserves is: would that clear thinking was that easy.
You can extend the idea to get rid of nouns. For instance, in Lakhota, ethnic names are verbs, not nouns. There’s a verb ‘to be a Lakhota’: the present forms mean ‘I am a Lakhota, you are a Lakhota, etc.’
You can have some fun with this. "The rock is under the tree" could be expressed as something like "There is stonying below the growing, greening, flourishing", or perhaps "It stones whileunder it grows greeningly." If we really encountered a language like this, however, I’d have to wonder whether we weren’t just fooling ourselves. If there’s a word that refers to stones, why translate it as ‘to stone’ rather than simply ‘stone’?
Jorge Luis Borges, in "Tlön, Uqbar, Tertius Orbis", posits a language without nouns; but this was because its speakers were Berkeleyan idealists, who didn’t believe in object permanence. However, linguists really do not like using semantic classes— or metaphysics— to define syntactic categories. (It’s not the right level of analysis; and it tends to obscure how languages really work by making them all look like Latin.)
Jack Vance (in The Languages of Pao) posited a language without verbs. For instance, "There are two matters I wish to discuss with you" comes out something like "Statement-of-importance — in-a-state-of-readiness— two; ear— of [place name]— in-a-state-of-readiness; mouth— of this person here— in-a-state-of-volition." Vance may be in a state of pulling our legs.
What’s case? It’s a way of marking nouns by function: e.g. Latin
Our possessives (’world’s’) started out as genitive case forms, though they’re really particles today. Most of our pronouns still have nominatives and accusatives (I vs. me, we vs. us).
Conlang enthusiasts generally either love case (because it makes a language compact and frees up word order) or hate it (because English doesn’t do much with it).
Not all case systems work the same way. Consider these roles:
A. subject of transitive sentences: I broke the windowEnglish and Latin treat A and C alike, using the nominative, B as the accusative. But some languages, such as Basque, group B and C together as the absolutive case, leaving A in the ergative case. (In a way it’s more logical... after all, the window always has the same semantic role, so in ergative/absolutive languages it has always the same case.)
If you think that’s weird, a few languages, such as Dyirbal, use the nominative/accusative system for 1st and 2nd person pronouns (I, we, you), and the ergative/absolutive system for nouns and for 3rd person pronouns.
You can have case without inflections, by using particles— e.g. Japanese o marks the accusative, no the genitive.
If a language doesn’t have case it may rely on word order to indicate the relationship between a verb’s arguments; but there is another alternative: head-marking on the verb. For instance, in the Swahili Kitabu umekileta? ‘Did you bring the book?’, the verb leta has prefixes indicating the subject (u- ‘you’) and the object (-ki-, a third person prefix agreeing in gender with kitabu). (-me marks the perfect tense.) The gender-specific object marker on the verb allows free word order even without case marking on the nouns.
Gender need not be simply masculine/feminine. Swahili, for instance, has eight gender classes, none of them masculine/feminine: one is for animals, one for human beings, one for abstract nouns, one forms diminutives, etc. Algonquian languages have animate/inanimate genders instead. For a conlang I created physical/spiritual genders.
Conlangers used to avoid gender, back when they were mostly creating auxlangs. But it’s a nice addition to a naturalistic language; Verdurian has masculine and feminine gender.
People ask, what is gender for? Gender is remarkably persistent: it’s persisted in the Indo-European, Semitic, and Bantu language families for at least five thousand years. It must be doing something useful.
A few possibilities:
Like case, personal endings make for nice compact sentences, since if you have them you can generally omit subject pronouns. Here’s an example from Spanish; note that English has a remnant of person/number agreement with the -s ending.
Some languages, such as Swahili and Quechua, include the object pronoun in the verb as well, usually as an infix. Quechua rimasunki means ‘he is speaking to you (s.)’.
The Romance languages have clitic forms of the pronouns, which stop just short of being verb inflections: e.g. French Je le vois, ‘I see him’; Spanish Dígame, ‘Tell me’.
Basque verbs can inflect to encode information about the listener. For instance, ekarri digute is a neutral way of saying ‘They brought it to us’; ekarri zigunate means the same, but also indicates that the listener is a woman addressed with the informal personal pronoun.
Some distinctions languages make on their verbs:
Languages also differ in how many distinctions are made in these categories.
The basic, universal persons are first (referring to the speaker), second (the hearer), and third (everybody else), and usually there are separate singular and plural forms. Turkish neatly fits this six-cell grid:
However, there’s lots of room to play around. Distinctions may be made:
It’s possible to bag the third person by using demonstratives instead (this one, that one). Many cultures seem to feel that raw pronouns are a little impolite, and use titles instead. Miss Manners informs us that the Holy Roman Emperor properly referred to himself as ma majesté.
I invented an alien race once that used different pronouns on land and underwater (they were amphibians), and had the inclusive/exclusive and proximate/obviative distinctions. They also had a pronoun for group minds, and pronouns for each of their three sexes. The complete list was impressive.
The first column comprises interrogative pronouns; the second two are demonstratives, and the rest are indefinite pronouns. The adjectives no, some, most, every are quantifiers.
It’s easy and diverting to regularize the table, although natural languages generally leave holes, which must be filled in with phrases (’in that way’, ‘for no reason’).
In some languages, like Russian, the interrogative pronouns (’Who did it?’) and the relative pronouns (’the man who did it’) are different.
Generally, if nouns decline, these pronouns decline the same way. Sometimes they’re worse— English, for instance, retained separate ‘from’ and ‘to’ forms for pronouns of place (hence = from here / hither = to here) long after such distinctions were lost for ordinary nouns.
Are the numbers based on tens, or something else? Many human number systems are based on fives or twenties instead. My pronoun-happy aliens had a duodecimal system. Intelligent machines would surely prefer hexadecimal...
How do you form higher numbers? ‘Forty-three’, for instance, may be formed in several ways:
Where nouns decline, numbers may also. Or they may not. In Latin, you stop declining the numbers at four.
In Indo-European languages we are used to unanalyzable roots for the numbers; but in other families number names are derivations, often related to the process of counting on fingers and toes— e.g. Choctaw 5 = tahlapi ‘the first (hand) finished’; Klamath 8 ndan-ksahpta ‘three I have bent over’; Unalit 11 atkahakhtok ‘it goes down (to the feet)’; Shasta 20 tsec ‘man’ (considered as having 20 countable appendages).
Adjectives can be something like nouns, something like verbs, or like neither. If they’re like nouns, they generally agree with their head noun in gender, case, and number. If they’re like verbs, they conjugate like verbs.
How are comparative expressions ("holier than thou", "most holy", "as holy as thou") formed?
It’s useful to have some regular derivations for or from adjectives:
Many languages, such as Latin and Russian, get by quite happily without them.
It may help to understand what the distinction really means. Ordinarily it’s pragmatic: the can be paraphrased ‘You know which one I’m talking about’. Consider:
I saw a man at the rodeo. The man had on a horrid plaid suit.A man in the first sentence signals that this character is being introduced in this conversation; the in the second sentence signals that he’s old news, he is in fact the same guy we just started talking about. The before rodeo also indicates that the speaker expects that the hearer can figure out which rodeo— if not, he’d have said a rodeo.
Word order serves the same function in Russian. There you’d say, in effect,
I saw man in rodeo. Man wore horrid plaid suit.When he’s introduced, the man lives near the end of the sentence; when he’s old news, he appears at the front.
(Actually, they don’t have many rodeos in Russia.)
Consider articles, numbers, quantifiers, adverbs, adjectives, possessives, subordinate clauses— e.g.
The ten very happy robots who passed the bar exam
You can generally divide phrases into heads and modifiers. Some languages are very consistent about placing all modifiers before, or all after the head. English is head-final, with the exception of subordinate clauses. Japanese is head-final too, but it’s more consistent: it would say "bar-exam passed ten robots".
Linguists like to talk about the order of subject, object, and verb, which of course can occur in just six combinations: SVO (as in English or Swahili), SOV (Latin, Quechua, Turkish), VSO (Welsh), OVS (Hixkaryana), OSV (Apurinã), VOS (Malagasy). The last three are for some reason rare, although they do exist.
Combinations and complications are common; for instance, simple German sentences are SVO, but subordinate clauses are SOV:
Wer seine Finanzen im Griff hat, ist einfach entspannter.But if there’s an auxiliary, it appears right after the subject, while the participle or infinitive moves to the end:
Mein Vater ist vor einigen Tagen nach London gefahren.(It’s really more complicated than that, but that’s the basics!)
"Subject" and "object" may work differently in languages with ergativity or topicalization.
In Flaidish, a topic can be expressed that isn’t a grammatical constituent of the sentence:
Luckit teeren Verduria zys kematt nellit.
English has a rather baroque procedure (inverting subject and verb). Other languages simply make use of a rise in intonation, or add a particle at the beginning of the sentence (e.g. Polish czy) or to the verb.
Many languages offer ways of suggesting the answer to the question. For instance, the Latin particle num expects the answer ‘no’ (Num ursi cerevisiam imperant? Bears don’t order beer, do they?), while nōnne expects ‘yes’ (Nōnne ursus animal implūme bipēs? Bears are featherless bipeds, aren’t they?).
Where questions are formed by appending a particle (e.g. -ne in Latin, or -chu in Quechua), the particle can be added directly to the word being questioned. We can only achieve the same effect in English by emphasis (Is the bear drinking beer? Is the bear drinking beer?) or by rearrangement (Is it beer that the bear is drinking?).
One way of asking a quesion in Chinese is to offer the listener a choice: Nǐ shì bu shì Běijīng rén? "You’re from Beijing?", literally "You be, not be from Beijing?"
Some folks, believe it or not, get by without having words for ‘yes’ or ‘no’. The usual workaround is repeat the verb from the question: "Do you know the way to San José?" can be answered "I know" or "I don’t know", as in Portuguese:
—Você conhece o caminho que vai a São José?
English usually moves the question word to the beginning of the sentence, but other languages don’t, asking in effect “You said what?” or “She’s going out with whose boyfriend?”
Also note that some languages have different pronouns for relative clauses (“The man who fishes”) and questions (“Who is this man?”).
Again, there are many options:
Latin has a neat trick: to express X and Y, you can say X Y-que, using a clitic. The expression SPQR, Senātus Populusque Rōmānus, is an example of this construction: the Senate and the People of Rome.
Latin also distinguishes inclusive and exclusive or: vel X vel Y means that you can have X or Y or both, but aut X aut Y means you get one or the other but not both.
Quechua (before the Spanish conquest) got by without conjunctions at all. For adding things together, you can usually get by with juxtaposition. Or you can use a case ending meaning with: in effect you say ‘X and Y’ by saying ‘X with Y’. I’m not sure how disjunctions (’or’) were handled— today Quechua uses forms borrowed from Spanish.
Quechua has an interesting way of forming relative clauses, using participles. For instance:
Chakra-y yapu-q runa-ta qaya-mu-saqRather than looking like an ordinary sentence (“the man plowed my field”), the subclause has the form of a participle (“the my-field-plowing man”).
Mandarin can subordinate any clause (and indeed many other things) with the particle de:
Wǒmen gěi tā shōuyīnjī le.If your language has cases, you must be careful to put the pronouns in the right case— English doesn’t give you the right instincts here, now that whom is used only by pedants. In Latin Quod fēcit sapiō “I know what he did”, quod ‘what’ is in the accusative, as it’s what was done, while in Virum quī fēcit sapiō “I know the man who did it”, quī ‘who’ is in the nominative.
It can be useful to think about relative clauses using transformations. For instance, a sentence like
The man that John hit yesterday prefers beer to wine.can be seen as deriving by transformation from one sentence that’s embedded in another:
The man [John hit him yesterday] prefers beer to wine.
In English, you can think of relativization as proceeding in two steps:
Your language may also put limits on what exactly can be relativized. The following examples are legal in English, for instance, but not in certain other languages.
the girl [you think [I love her]]
Not everything is possible in English:
This is the man [my girlfriend’s father is a friend of John and him]or (thanks to Leo Connolly for this example)
There’s the barn [more people have gotten drunk down in back of it than any other barn in the county]
Some languages can handle such sentences simply by leaving the pronoun in the subclause. S.J. Perelman liked to do this in English:
“That’s the man which my wife is sleeping with him!”Some other constructions that can be thought of as transformations:
A natural language has a wide variety of registers, or styles of speech: from the ceremonial or ritual, to the official or scientific, to the journalistic or novelistic, to ordinary conversation, to colloquial, to slang. Children talk in their own way; so do poets. The upper crust speaks differently from the lower classes.
Some of these registers work in predictable ways. For instance, rites are often conducted in an archaic form of the language (or sometimes another language entirely). Educated speech usually includes older, longer, foreign, or technical words. In Verdurian, for instance, educated speech borrows many words from the parent language, Caďinor.
Slang often provides humorous substitutions for common words. Some such substitutions from Vulgar Latin have become the normal word in the Romance languages: testa ’pot’ replaced caput ’head’, giving French tête; bucca ’cheek’ replaced os ‘mouth’, giving bouche; caballus ‘nag’ replaced equus ‘horse’, giving cheval.
Slang also borrows from minority groups: e.g. French toubib, chnouf, bled from Arabic; English shiv and pal from the Gypsies, schlock from Yiddish, jazz and jive from blacks; Spanish calato and cachaco from Quechua.
All cultures have ways of expressing politeness, but they differ in the methods used, and in what ways politeness is grammaticalized.
According to Anna Wierzbicka, polite speech in English lays great stress on respecting others and avoiding imposition. English has a vast array of indirect forms for asking people to do things, or even for offering them things: Will you have a drink? Would you like a drink? Sure you wouldn’t like a beer? Why don’t you pour yourself something? How about a beer? Aren’t you thirsty? We’re so used to such pseudo-questions that we use them rather than a direct imperative even when actual politeness is far from our minds: Will someone put this fucking idiot out of his misery? For Christ’s sake, will you get lost?
In Polish, by contrast, a courteous host pushes his hospitality on the guest, dismissing the guest’s expressed remonstrances and desires as irrelevant: Prosze bardzo! Jeszcze troszke! —Ale juz nie moge! —Ale koniecznie! "Please, a little more!" "But I can’t!" "But you must!" And Polish is very free with imperatives— indeed, to be really forceful you must use the infinitive instead.
Japanese is often even more indirect than English: e.g. it avoids the imperative "Drink Coca-Cola!" in favor of Koka kora o nomimashou! (lit. "We will drink Coca-Cola!").
Japanese is also notable for having verbal inflections which add a level of politeness (e.g. tetsudau ‘helps’; polite form tetsudaimasu), as well as entirely different lexical items with the same purpose (e.g. iku ‘go’, humble form mairu, honorific irassharu).
Terms of address are a fertile field for exquisite complications; so are pronouns. In quite a few languages it’s perceived as rather a familiarity to address someone using the second person pronoun: to be polite you use the plural (French vous), or a third-person form (Italian Lei, Spanish Usted from vuestra merced ‘your mercy’, Portuguese o senhor ‘the gentleman’), or a title (Japanese sensei ‘teacher’, otōsan ‘father’, etc.). If this seems odd, it’s worth noting that English took the first approach, so thoroughly that the second person singular pronoun ‘thou’ disappeared.
Attempts have been made to formulate universals of politeness, but this can be tricky. E.g. it’s been suggested that politeness involves avoiding disagreement; but in Jewish culture disagreement expresses sociability and is taken as bringing people closer together. Or, it’s been said that direct praise of oneself is avoided, and praise of others is approved; but self-praise among Black American speakers is good form, and direct praise of others is avoided in Japanese.
For poetry you must consult your own Muse. However, it’s worth pointing out that rhyme is not the only thing poetry can be based on:
Is poetry a popular art, like rap? If so, it probably stays fairly close to colloquial speech. If it’s a rarefied exercise, it may either maintain archaic forms or experiment with the language.
Finally, think about what foreign cultures influenced your culture’s poetry. Latin borrowed many Greek meters; and European poetry has been deeply influenced by Latin.
We’ve touched on these above, but for a more in-depth introduction, see my grammar of Xurnese.
You can add enormous depth to a fantasy language by giving it a history, and relatives. Verdurian and its sister languages Barakhinei, Ismaîn, and Sarroc all derive from Caďinor, as French and Spanish derive from Latin. Caďinor, Cuêzi, and Xurnese, in turn, all derive from Proto-Eastern, and thus are related in systematic ways, much as Latin, Greek, and Sanskrit all derive from proto-Indo-European.
What can you do with such relationships?
Words often change meaning as they’re borrowed. Some cute examples from Verdurian:
To do this well you have to know something about historical linguistics. The sci.lang faq will give a brief overview. Better yet, read Theodora Bynon’s excellent Historical Linguistics, or R.L. Trask’s book of the same name, or Hans Henrich Hock’s more thorough Principles of Historical Linguistics.
The basic principle is that sound change is almost completely regular. This is good news: it means all you have to do is devise a set of sound changes between the parent language and its derivative(s), and apply them to each word.
Here, for instance, are just some of the sound changes from Caďinor to Verdurian; you can see the full set here.
A different set of sound changes can be used to create a sister language. For instance, Barakhinei changes unvoiced consonants to voiced between vowels (this is an extremely common change in languages), loses the final sound of each word, etc. The net result is a language related to but subtly different from Verdurian:
If you’re interested in applying sound changes to one language in order to generate a descendent language, you may find my Sound Change Applier program useful.
You can use the same technique to create dialects for a your language. Linguistically, dialects are simply a set of language varieties which haven’t diverged far enough apart that their speakers can’t understand each other. Dialects can be created simply by specifying a smaller number of less dramatic sound changes.
For instance, the Verdurian dialect of Avéle is characterized by the following changes:
Dialects can also have their own lexical terms, of course, perhaps borrowed from neighbors or previous inhabitants of the local territory.
People often suppose that the dialect of the capital city (or whatever other place has supplied the standard language) is more ‘pure’ or more conservative than provincial speech. In fact the opposite is likely to be true: the active center of a culture will see its speech change fastest; rural or isolated areas are more likely to preserve older forms.
If you’re inventing an auxlang you may of course want to do everything possible to prevent the rise of dialects. This is probably an expression of the fascistic streak common to language tinkerers. Why not design your interlanguage with dialects, reflecting the phonology of various linguistic regions? The resulting language, with varieties close to the major natural languages, might achieve more acceptance than uniform interlanguages have.
Back to Outline
Back to Sounds
On to Writing