Agreement Rules with Conditions
Monday, May 25th, 2009In many languages it is possible for the spelling of one word to have an effect on adjacent words. For example:
- In English, the indefinite article ‘a’ becomes ‘an’ before a vowel.
- In Welsh, the conjunction ‘a’ becomes ‘ac’ before a vowel.
- In French, the definite articles ‘le’ and ‘la’ become ‘l” (l-apostrophe) before a vowel.
The left-right agreement rules described previously go some way towards meeting this requirement, but their pattern-matching ability is limited to whole words only: they cannot look inside a word to make decisions according to how it is spelt. It would be possible to manually enter all the required tags into the lexicon, but I would prefer an automated solution for obvious reasons.
The method I’ve adopted is to extend the agreement rule syntax to allow conditions to be specified, much as decomposition rules do already. The syntax will be a little different:
agreement <direction> <pattern>
where <condition>
and <condition>
...
and <condition>;
(I think this is more self-explanatory than the current decomposition rule syntax, and if decomposition remains as a distinct rule type - see below - then I will be changing it to match.)
The conditions themselves will use the existing expression syntax, but the the addition of a new operator called eval:matches. This takes two arguments, a regular expression and a token, and returns true if and only if the token matches the regular expression.
A suitable rule for distinguishing ‘a’ from ‘an’ in English might then be:
agreement rightward (a[+vowel-after] $x)
where ((eval:match “^[aeiou]“) $x);
Alternatively, the word itself could be tagged as having an initial vowel, then this information transferred to the preceding word using a ordinary agreement rule:
agreement upward $x[+initial-vowel]
where ((eval:match “^[aeiou]“) $x);
agreement rightward
(a[+vowel-after] $x[initial-vowel]);
Once a word has been tagged it can be altered as necessary using an inflectional rule. The second method would be appropriate when there are several ways in which the surface form can be affected by an initial vowel, so as to avoid performing the same regular expression match more than once.
In the interests of orthogonality I intend to allow conditions to be applied to transformation rules too using the same syntax. Interestingly this would make the behaviour of transformation rules and decomposition rules very similar, the main difference being that they are applied at different stages of the translation process. If some mechanism were introduced for explicitly defining translation stages (which has the potential to be a very useful feature in its own right) then decomposition rules - as a distinct rule type - may become unnecessary.