One of the striking features about PIE is its reliance on vowel changes in conjugation; some of the rare survivals of this in English are verb paradigms such as sing/sang/sung. PIE had a rich system of inflections, including three numbers (singular/dual/plural) and three genders.
A readily available reference on Proto-Indo-European is the back of the American Heritage Dictionary, a readily available source on PIE and quite interesting to anyone interested in etymology. Why be satisfied with a derivation from Latin or Germanic when you can trace a word back to PIE?
Germanic. The earliest Germanic texts we have are a 4C Gothic translation of the Bible. The earliest English texts are from the 7C. English does not derive from German; rather, both derive from proto-Germanic.
Italic. In ancient times Latin was only one of several Italic languages spoken in Italy; others included Oscan, Umbrian, and Faliscan. Some of these survived into the 1C, but all the modern Romance languages are derived from Latin. The earliest texts in Romance languages are French, from the 9C.
We have an enormous corpus of ancient Latin; the earliest inscriptions date back to about 500 BC. For an introduction to Latin you couldn't do better than Humez & Humez's Latin for People, which contains such delightful sample sentences as Venimus ad Galliam sed non currimus, "We're coming to Gaul but we're not running", or Dulce et decorum est pro patria mori. Amarum et indecorum est a Vesuvio interfici, "It is a sweet and seemly thing to die for one's country. It is a bitter and unseemly thing to be buried by Vesuvius."
Celtic. Irish is an official language of Ireland, and public institutions are named in Irish.
The earliest records of any Celtic languages are 1C inscriptions in Gaulish.
Celtic numbers are preserved in counting sets called scores, used in counting sheep, counting stitches, and in children's games. Here's a set from the North Country: yan, tan, tethera, pethera, pimp, sethera, lethera, hovera, covera, dik.
Hellenic. Mycenaean Greek is the language of Linear B, dating to the 14C BCE, and proven to be Greek by Michael Ventris in 1952. Linear B has nothing to do with the Greek alphabet, which was invented centuries later; it was written using a syllabary.
Tocharian A and B are a pair of extinct languages once spoken in Xinjiang, whose existence came to light only in the 1890s.
Albanian was one of the later languages to be assigned to Indo-European; it has replaced a substantial portion of the IE vocabulary.
Slavic. The earliest Slavic inscriptions date back to the 9C.
Anatolian. The texts in Hittite, dating to the 17C BCE, are the oldest Indo-European texts we have, but were discovered only about a century ago. They provided the most spectacular confirmation of a historical-linguistic prediction-- namely Saussure's postulation of coefficients sonantiques, the so-called laryngeals, in Proto-Indo-European, not directly attested in any then known IE language, but some of which actually turned up in Hittite. On the other hand, Hittite turned out to more different from the other IE languages than was expected, which has led to some re-evaluation of the protolanguage. Some people consider Hittite and Indo-European to be branches off an earlier "Indo-Hittite"; but my Indo-Europeanist consultant considers this a ploy to avoid having to integrate information from Hittite into IE.
Indo-Iranian We have Old Persian inscriptions dating to the 6C BCE, and Sanskrit texts dating back to about 1000 BCE. In the 18C, European scholars newly familiar with Sanskrit recognized that it was related to Greek and Latin, and began a philological joyride that ended in the reconstruction of Proto-Indo-European (chauvinistically called Indogermanisch by the mostly German scholars involved). Early on Sanskrit was assumed to be particularly close to the protolanguage, but it has since been realized that this is not the case. Linguists retain a reverence for the accuracy of the ancient Sanskrit grammars, such as those of Panini (-4C).
Ardhamagadhi, one of the post-Sanskrit dialects or Prakrits, is the language of the Jain scriptures.
Semitic languages also have a long written history, starting with Akkadian around 3000 BCE. We have Canaanite inscriptions going back to the 20C BCE. The Tanakh, the Hebrew Bible, was written over a period of a milennium (1200-200 BCE).
The earliest Arabic inscriptions date to the 4C CE, but of course its classic text is the 7C Qur'a:n. Arab regions are noted for diglossia, in which the spoken and written languages are highly divergent. Throughout the Arab world the standard written language (also used for formal speech) is Classical Arabic, which no one speaks as a native language-- it must be learned in school. The spoken language has diverged greatly from this standard, and varies widely between countries as well; uneducated Arabs from different ends of the Arab world cannot communicate with each other. The Egyptian family boasts some of the oldest written records (from 3000 BCE), as well as spanning the longest time, 4500 years-- Chinese won't equal the record of Ancient Egyptian until about 2700 CE. Modern Egyptian does not descend from Ancient Egyptian but from Arabic. The modern descendent of the pharaohs' language is Coptic, still used as a liturgical language by Egyptian Christians. Nimbia, a dialect of Gwandara in the Chadic family, is notable for having a duodecimal number system. 12, not shown on the Numbers page, is tùni; 13 is tùni m`bé da '12 + 1', 30 is gùme bi nì shídé '24 + 6', etc.
Niger-Congo cannot be considered a well-established family (though some of its subfamilies, such as Bantu, are). There is no reconstruction of Proto-Niger-Congo on a par with IE, Semitic, Austronesian, Algonquian, etc.
An interesting tidbit about Krongo: the numerals are verbs. (This is true of a few Amerind languages as well.)
Niger-Congo numeric systems are generally based primarily on fives. The numbers 6-9, for example, are often 5 + 1-4. Sometimes the derivations have become obscured through sound change (compare Spanish once = 10 + 1) or through borrowing (e.g. Swahili has borrowed 6-9 from Arabic). Other derivations are possible as well. Sometimes there's a special word for 8 (itself perhaps derived from 'two fours'), and 9 = 8 + 1; there may likewise be a word for 6 used to derive 7. 9 and sometimes 8 may be expressed as '10 minus 1 (or 2)'.
For higher numbers, the Bantu languages tend to be organized by tens, the western languages by twenties.
The Yoruba number system is notable for its reliance on subtraction: e.g. 19 ookan din logun = 20 - 1, 46 = 60 - 10 - 4, 315 orin din nirinwo odin marun = 400 - (20*4) - 5.
The word for 7 in Kumbundu (a Bantu language), sambuari, derives from 6 + 2-- this is a euphemism, replacing the original word for 7, which is taboo. If that seems strange, there are rumors of a major North American civilization in which buildings are built without a 13th floor.
As can be seen by comparing Johnston 1919 with the 1970s Tanzanian Language Survey, compound numbers for 6-9 are being replaced in many languages with the Swahili numbers (themselves from Arabic).
Qiangic. Information on this branch of Tibeto-Burman has only very recently come to the attention of Western scholars, thanks to Chinese research of the '80s and '90s. The extinct Tangut or Xixia language, which is amply attested in a logographic script form the 11C, is now thought to belong to this family.
People often think that linguists classify languages into families based on similar-sounding words. In fact the basis is regular sound correspondences between languages, whether the words sound the same or not. A neat example comes from the East Santo group: Sakao iedh and Shark Bay tharr don't sound at all alike, nor anything like proto-Vanuatu *vati. But they are in fact all cognates, and help demonstrate that these languages are related.
Linguist Jacques Guy has reconstructed the course of events in this way. Both languages changed bilabials to dentals before front vowels, and lost final vowels; thus *vati --> *thati --> *that.
In Sakao, there was furthermore a complex vowel shift; and then almost all consonants were lenited (weakened), voiceless stops to voiced fricatives, fricatives to approximants: *that --> *thet --> *yedh.
Finally, in Shark Bay, final -t changes to a trill: *that --> *tharr. QED.
...Bam, a Sepik-Madang language, is curious for being a 4-based system. 10 is 'four-two and two', 12 is kiki tuol 'four-three', and so on. Curiously 20 kiki lim uses the usual Austronesian morpheme for 5, but 5 itself doesn't: 5 is kiki be kubua 'four and one'.
The Kewa numbers represent just the beginning of a 24-member counting sequence. The first five numbers name the little finger through the thumb; but instead of continuing with the other hand the Kewa keeps indicating points a few inches along on the body: 9 = 'forearm', 15 = 'shoulder', 20 = 'ear', 24 rikaa = 'between eyes'.
Kanum and Kimaghana seem to be base 6 systems.
Andaman. 3/4/5 in Aka-Bea-da etc. actually mean 'one more', 'some more', 'all'.
Many of the Australian languages have a limited set of numbers. (That doesn't mean they're simple languages-- they tend to be quite complex.) Some number words, as shown, represent not a single number but a range.
I have to wonder when some languages, like Yir Yoront, have a full set of numbers, but we're told that most Australian languages stop at 2, 3, or 4. As in many languages, the number words in Yir Yoront refer directly to the process of counting on the hands: 5 = "whole hand", 7 = "hand entire, fingers two", 10 = "hand-two". It makes me wonder if most fieldworkers are asking the wrong questions.
In Indo-European languages we are used to unanalyzable roots for the numbers; but in other families number names can be derivations, often related to the process of counting on fingers and toes-- e.g. Choctaw 5 = talhlhaapih 'the first (hand) finished'; Bororo 7 ikéra metúya pogédu 'my hand and another with a partner'; Klamath 8 ndan-ksahpta 'three I have bent over'; Unalit 11 atkahakhtok 'it goes down (to the feet)'; Shasta 20 tsec 'man' (considered as having 20 countable appendages).
Greenberg groups all the Amerindian languages below (that is, excluding Eskimo-Aleut and Na-Dené) into a single family, Amerind. His conclusions are based only on "mass comparison", not the comparative method, and are not accepted by Amerindianists.
The North American languages are well studied, and many families here are well established, often with reconstructed proto-languages. The same cannot be said for South America. Check back in fifty years.
Many Mexican, Central American, and Californian languages have number systems based not on 10's but on 20's. This is not always evident from the numbers from 11 to 19, some of which may be compounds as in a decimal system; but it becomes clear from higher numbers-- e.g. 100 is expressed as 'five twenties', and there are special words for powers of 20-- e.g. in Yucatec 201 through 206 are kal, bak, pic, calab, kinchil, alau.
The Mayan languages are notable for having a fully developed writing system, deciphered only in this century, and for having a symbol for zero. For the story of the decipherment, see Michael Coe's Breaking the Maya Code.
The Incas exchanged accounting information by means of kipus (literally 'knots'), bundles of knotted strings. Each string recorded one or more numbers, and strings were grouped into color-coded bunches, sometimes with totals attached, as in a spreadsheet. The numerical code was decimal; each digit was represented by 0 to 9 knots; the units were made with a different sort of knot so that more than one number could be coded on one string.
Urarina (which Ruhlen puts in this group, but others consider an isolate) boasts two very unusual features among the world's languages: it has no /p/ sound (note that Quechua pusaq '8' was borrowed as fusa-); and it is consistently OVS.
The Cherente word for 2 (ponhuane) analyzes as 'deer track' (since a deer hoofprint has two separate parts).
Michif is hard to figure: (oversimplifying), the nouns, pronouns, and numerals (except #1; cf. Cree peyak) are French, the verbs are Cree-- fairly complex verbs, too. It can't really be considered a pidgin; most likely it developed among bilinguals.
A priori languages are not based on existing languages; they're often an attempt to create a more logical or more organized way of looking at the world. (The lexicon of Loglan and Lojban is not technically a priori; but as 'logical languages' they certainly fit into this category.)
Many projects have 1 = ba or something like it-- almost inevitable if you work out the numbers in alphabetical order. E.g. Leibniz uses consonants for the digits, in alphabetical order; vowels for the powers of ten, also alphabetical, so 1679 = bohilena. But it's Letellier who wins the prize for conciseness, with one letter per digit: e.g. 1679 = ba:co: (the colons represent macrons).
Hilbe has a showoffy trick for representing higher numbers: rXr is one million to the X power-- e.g. rar = 106, rer is 1012-- up to a million to the millionth power, which has its own name, qar = 106000000. And qar to the qarth power is xar. Beats a googolplex to hell any day.
Some of the language names are unwieldly or repetitive, and they're represented on the numbers page by the creator's name.
A posteriori languages are based on existing natural languages (or are developments of previous a posteriori languages (e.g. Ido)).
Artlangs are languages developed for personal or artistic reasons alone.
Tepa is worth checking out; it's based on Amerindian models, and designed by an expert in Numic.
Marnen calls DiLingo "possibly the funniest conlang"; I have to agree. It's hard to read some sentences in it without laughing.
Maktalu is a duodecimal language; 11 and 12 are ushi and fani; the others are siblings or ancestors.
Cispa is intended for eight-fingered aliens, it has both octal and decimal variants. Jarrda is spoken by raccoons.
The words for 11-18 in Draseléq include quirky etymologies like 12 = "the divisible one" and 17 = "the imperfect".
My Wedei is a base-6 system; Methaiun is a 5-10-18 system (18 is oranda), since Almeans have five fingers but four toes.