|
For a full explanation of what that means, see my upcoming Syntax Construction Kit, or your favorite book on transformational grammar or computer languages.
I've included some sample rules to get you started. Try them out!
(SS among the samples stands for Syntactic Structures; these are the rules from Chomsky’s 1957 book.)
A general warning: Writing rules is a form of programming, and it's very very easy to write rules that don't work as you expect them to. The debugging settings can be useful in finding out why that is. Read this document carefully to see some of the pitfalls.
Basic rules look like this:
S=A BS is called a nonterminal symbol, because it appears on the left hand side of a rule. This one means that S can be replaced with A B.
A=A a|aAdding this rule means that A can be replaced by a, or by A a.
It's a convention to use capitals for nonterminals and lowercase for terminals– symbols taht can't be replaced. However, ggg doesn't enforce this. If you are following the convention, a is a nonterminal and that part of the sentence is done. But the A will be replaced by another application of the same rule. (So, rules can be recursive.)
A=A a|aThe = separates the category, ont he left, from its replacement, on the right. You can use → instead. The | separates multiple possibilities— ggg will choose randomly from these. So this is short for saying
A=A a
A=a
Select Single letters for the first case, Words for the second.
E.g.
A=(A)a(c)is short for
A=a
A=Aa
A=ac
A=Aac
Tense=pastThis means that Tense can be replaced with Past | VPL | ø. (The latter two options are conditional; see below.)
Tense=VPL/NPS _
Tense=ø/NPP _
This is one way you delete things. The ø won’t be output to the final string.
But till it’s deleted, it can also be an input for other rules, which can be surprisingly useful.
Check Show debugging output to show a complete derivation. The program will indicate at each step what rules it thinks it can apply— the one actually selected will be boldfaced. Then it will show the output at that step. And so on till it can't apply any more.
B=p/A _The syntax may be familiar to you from the SCA². The meaning is similar: “B can turn into p if it occurs just after A.” The _ in the environment is required and represents the replaced element (here, B). So this rule could apply to A B or a B or A A B C but not B A.
B=p/AThis says to replace B with p only if the element A occurs anywhere in the string. So this rule might be applied to A B but not to B B.
Note, this is an alternative, often simpler way to handle things like agreement.
A B=A pThis does what it looks like: replace A B with A p. Transformations allow powerful rearrangements of the string and are generally required for handling natural languages. Internally, ggg rewrites rules with environments as rules with transformations.
a * a=qThis means “replace a sequence a...a, where ... is anything at all, with q.” You can also put * in the replacement string. In this case, whatever was found in the * location will be copied to the output string in the appropriate place.
* VP=* VP *The above curious rule is used in the French sample rules. As you can see, the element before the verb is copied after it: e.g. 3p VP becomes 3p VP 3p. This is a cheap way of handling verb agreement: the first copy will eventually apply to the preceding NP; the second one will apply to the verb.
The basic rule is: separate groups of rules with +. ggg will keep applying rules in one group until it can't any more, then move on to the next group, and so on.
For an example, see the SS Verb Complex sample rules.
More complicated rulesets are defined with the following symbols, placed before the ruleset:
Note that rules will keep applying if they can, which can make the program do things you don't expect. E.g., from the French rules:
? All rules in this set are optional 1 Apply one of the rules in this set 1? This set is optional, and just one can apply
Fin VP=Fin VP FinThis must be marked 1. Otherwise a sentence will produce NP Fin VP Fin, then NP Fin Fin VP Fin, then NP Fin Fin Fin VP, and so on, because the rule can apply to its own output.
ing read=readingThis gets tedious if you have multiple forms to generate. So there are special rules marked with µ (for ‘morphology’; just copy-and-paste the letter). E.g. in the SS rules we have, in part:
µ µ VPL past en ing øThe first line is the key to the rest of the entries, and defines what affixes are supported. So the above lines tell ggg that the ing form of read is reading, the past form of eat is ate, and so on.
µ read reads read read reading
µ eat eats ate eaten eating
You must write your rules such that the affix comes first.
If a rule ends early, the word will be unchanged. This can be used for defective paradigms (e.g. can simply has no participles). I also use it to add an additional form for English be, to store the non-3s form of the present tense, which for every other verb defaults to the verb root.
There's no way to have default forms, I’m afraid, nor multiple categories (e.g. verbs vs nouns).
Note that morphological rules apply at the end, and therefore can't be corrected by further rules.
Start with S and give the most general rules first— e.g. those that define the basic shape of a sentence.
ggg operates only on strings, not trees. That is, it does not really know the derivation so far. Let's say you have a SOV language and want to have SVO as a transformation. And say you turn noun phrases into N + Det. You write:
S=NP NP VThis will only generate N Det N Det V. That’s because the last rule never finds any NPs to apply to— they were eliminated by the second rule. Put general transformations early, while the things they apply to are still in the derivation. This will work:
NP=N Det
+
NP V=V NP
S=NP NP Vggg is intended to model syntax, not morphology. So you wouldn’t want to model Sanskrit with it. But its morphology operations should be sufficient for languages like English or French. Put the morphology bits near the end. If you use µ rules, make sure you generate affixes before the word, and list all the possible affixes in the first µ rule.
NP V=V NP/NP _
NP=N Det
If you need agreement, use some of the tricks above: copy an affix, or use the existence check.
Don’t overdo the number of words or tenses you handle; that won’t improve your understanding of the syntax, it just causes you extra work.
Finally... you may just want a tool that knows more about language. But, hey, I’ve got some tools that meet the bill! Try out the Generative Tree Gadget or the Minimalism gadget.