ggg help

ggg is a Javascript program that lets you construct a context-sensitive grammar.

For a full explanation of what that means, see my upcoming Syntax Construction Kit, or your favorite book on transformational grammar or computer languages.

I've included some sample rules to get you started. Try them out!

(SS among the samples stands for Syntactic Structures; these are the rules from Chomsky’s 1957 book.)

A general warning: Writing rules is a form of programming, and it's very very easy to write rules that don't work as you expect them to. The debugging settings can be useful in finding out why that is. Read this document carefully to see some of the pitfalls.

Overall operation

The program always starts with the string S. It looks through the production rules to see what S can be expanded to. It keeps applying rules until no more rules can be applied.

Basic rules look like this:

S=A B
S is called a nonterminal symbol, because it appears on the left hand side of a rule. This one means that S can be replaced with A B.
A=A a|a
Adding this rule means that A can be replaced by a, or by A a.

It's a convention to use capitals for nonterminals and lowercase for terminals– symbols taht can't be replaced. However, ggg doesn't enforce this. If you are following the convention, a is a nonterminal and that part of the sentence is done. But the A will be replaced by another application of the same rule. (So, rules can be recursive.)

Production rules

Let's look at the syntax again:
A=A a|a
The = separates the category, ont he left, from its replacement, on the right. You can use instead. The | separates multiple possibilities— ggg will choose randomly from these. So this is short for saying
A=A a
A=a

Space separated or not

When making toy grammars with letters, it’s most useful to write your rules without spaces. But once you're dealing with words, you want to separate the parts of a rule with spaces.

Select Single letters for the first case, Words for the second.

Optional elements

If you check Allow optional symbols with (), you can put optional elements in parentheses.

E.g.

A=(A)a(c)
is short for
A=a
A=Aa
A=ac
A=Aac

Null symbol

It can be useful to have a null symbol which isn’t actually output. Use ø for this. E.g. in the SS rules I have
Tense=past
Tense=VPL/NPS _
Tense=ø/NPP _
This means that Tense can be replaced with Past | VPL | ø. (The latter two options are conditional; see below.)

This is one way you delete things. The ø won’t be output to the final string.

But till it’s deleted, it can also be an input for other rules, which can be surprisingly useful.

Debugging help

Check Show parsed rules to have the program show what it thinks the rules are. As it expands | and transformations (wee below) into fuller representations, this can help you understand why a rule is working or not.

Check Show debugging output to show a complete derivation. The program will indicate at each step what rules it thinks it can apply— the one actually selected will be boldfaced. Then it will show the output at that step. And so on till it can't apply any more.

Environments

So far the rules have been context-free. You can create context-sensitive rules by using environments. A simple example:
B=p/A _
The syntax may be familiar to you from the SCA². The meaning is similar: “B can turn into p if it occurs just after A.” The _ in the environment is required and represents the replaced element (here, B). So this rule could apply to A B or a B or A A B C but not B A.

Existence check

Sometimes it’s useful to check only if a particular element exists. You can do this with an environment with no _. You can only check one element, which can be a terminal or nonterminal. For instance:
B=p/A
This says to replace B with p only if the element A occurs anywhere in the string. So this rule might be applied to A B but not to B B.

Note, this is an alternative, often simpler way to handle things like agreement.

Transformations

The above rule could also be stated
A B=A p
This does what it looks like: replace A B with A p. Transformations allow powerful rearrangements of the string and are generally required for handling natural languages. Internally, ggg rewrites rules with environments as rules with transformations.

Wildcards

A transformation can include wildcards, indicated with *:
a * a=q
This means “replace a sequence a...a, where ... is anything at all, with q.” You can also put * in the replacement string. In this case, whatever was found in the * location will be copied to the output string in the appropriate place.
* VP=* VP *
The above curious rule is used in the French sample rules. As you can see, the element before the verb is copied after it: e.g. 3p VP becomes 3p VP 3p. This is a cheap way of handling verb agreement: the first copy will eventually apply to the preceding NP; the second one will apply to the verb.

Rule order

For more complicated grammars, rule ordering becomes important.

The basic rule is: separate groups of rules with +. ggg will keep applying rules in one group until it can't any more, then move on to the next group, and so on.

For an example, see the SS Verb Complex sample rules.

More complicated rulesets are defined with the following symbols, placed before the ruleset:

? All rules in this set are optional
1 Apply one of the rules in this set
1? This set is optional, and just one can apply
Note that rules will keep applying if they can, which can make the program do things you don't expect. E.g., from the French rules:
Fin VP=Fin VP Fin
This must be marked 1. Otherwise a sentence will produce NP Fin VP Fin, then NP Fin Fin VP Fin, then NP Fin Fin Fin VP, and so on, because the rule can apply to its own output.

Morphology

The basic idea from Syntactic Structures is to use transformations for morphology. E.g.
ing read=reading
This gets tedious if you have multiple forms to generate. So there are special rules marked with µ (for ‘morphology’; just copy-and-paste the letter). E.g. in the SS rules we have, in part:
µ µ VPL past en ing ø
µ read reads read read reading
µ eat eats ate eaten eating
The first line is the key to the rest of the entries, and defines what affixes are supported. So the above lines tell ggg that the ing form of read is reading, the past form of eat is ate, and so on.

You must write your rules such that the affix comes first.

If a rule ends early, the word will be unchanged. This can be used for defective paradigms (e.g. can simply has no participles). I also use it to add an additional form for English be, to store the non-3s form of the present tense, which for every other verb defaults to the verb root.

There's no way to have default forms, I’m afraid, nor multiple categories (e.g. verbs vs nouns).

Note that morphological rules apply at the end, and therefore can't be corrected by further rules.

Some advice

Like the SCA², ggg is a simple but powerful tool, which can do much more than you might think. But you do have to think a little like a syntactician and a little like a programmer. Read the book once it comes out; in the meantime, study the examples closely to see some of the possible tricks.

Start with S and give the most general rules first— e.g. those that define the basic shape of a sentence.

ggg operates only on strings, not trees. That is, it does not really know the derivation so far. Let's say you have a SOV language and want to have SVO as a transformation. And say you turn noun phrases into N + Det. You write:

S=NP NP V
NP=N Det
+
NP V=V NP
This will only generate N Det N Det V. That’s because the last rule never finds any NPs to apply to— they were eliminated by the second rule. Put general transformations early, while the things they apply to are still in the derivation. This will work:
S=NP NP V
NP V=V NP/NP _
NP=N Det
ggg is intended to model syntax, not morphology. So you wouldn’t want to model Sanskrit with it. But its morphology operations should be sufficient for languages like English or French. Put the morphology bits near the end. If you use µ rules, make sure you generate affixes before the word, and list all the possible affixes in the first µ rule.

If you need agreement, use some of the tricks above: copy an affix, or use the existence check.

Don’t overdo the number of words or tenses you handle; that won’t improve your understanding of the syntax, it just causes you extra work.

Finally... you may just want a tool that knows more about language. But, hey, I’ve got some tools that meet the bill! Try out the Generative Tree Gadget or the Minimalism gadget.


Home