Archive for May, 2009

Agreement Rules with Conditions

Monday, May 25th, 2009

In many languages it is possible for the spelling of one word to have an effect on adjacent words. For example:

  • In English, the indefinite article ‘a’ becomes ‘an’ before a vowel.
  • In Welsh, the conjunction ‘a’ becomes ‘ac’ before a vowel.
  • In French, the definite articles ‘le’ and ‘la’ become ‘l” (l-apostrophe) before a vowel.

The left-right agreement rules described previously go some way towards meeting this requirement, but their pattern-matching ability is limited to whole words only: they cannot look inside a word to make decisions according to how it is spelt. It would be possible to manually enter all the required tags into the lexicon, but I would prefer an automated solution for obvious reasons.

The method I’ve adopted is to extend the agreement rule syntax to allow conditions to be specified, much as decomposition rules do already. The syntax will be a little different:

agreement <direction> <pattern>
  where <condition>
  and <condition>
  ...
  and <condition>;

(I think this is more self-explanatory than the current decomposition rule syntax, and if decomposition remains as a distinct rule type - see below - then I will be changing it to match.)

The conditions themselves will use the existing expression syntax, but the the addition of a new operator called eval:matches. This takes two arguments, a regular expression and a token, and returns true if and only if the token matches the regular expression.

A suitable rule for distinguishing ‘a’ from ‘an’ in English might then be:

agreement rightward (a[+vowel-after] $x)
  where ((eval:match “^[aeiou]“) $x);

Alternatively, the word itself could be tagged as having an initial vowel, then this information transferred to the preceding word using a ordinary agreement rule:

agreement upward $x[+initial-vowel]
  where ((eval:match “^[aeiou]“) $x);
agreement rightward
  (a[+vowel-after] $x[initial-vowel]);

Once a word has been tagged it can be altered as necessary using an inflectional rule. The second method would be appropriate when there are several ways in which the surface form can be affected by an initial vowel, so as to avoid performing the same regular expression match more than once.

In the interests of orthogonality I intend to allow conditions to be applied to transformation rules too using the same syntax. Interestingly this would make the behaviour of transformation rules and decomposition rules very similar, the main difference being that they are applied at different stages of the translation process. If some mechanism were introduced for explicitly defining translation stages (which has the potential to be a very useful feature in its own right) then decomposition rules - as a distinct rule type - may become unnecessary.

Left-Right Agreement Rules

Tuesday, May 5th, 2009

One language feature which I have not yet been able to implement in a satisfactory manner is that of initial consonant mutation as found in Welsh and other Celtic languages. The difficulty lies not with the morphological process itself, which is for the most part straightforward, but rather the decision as to when to perform it.

For example, one of the ways in which the aspirate mutation is triggered in Welsh is when a word is preceded by ‘a’ or ‘ac’ (meaning ‘and’). This cannot easily be expressed using agreement rules (as currently implemented) for two reasons:

  • Agreement rules act on the text as a tree rather than a sequence. Words which are linearly adjacent to each other may be arbitrarily far apart in terms of tree structure. It follows that adjacency cannot be expressed by any fixed set of agreement rules.
  • Agreement rules are applied before transformations (necessarily so, because transformations often depend on tags that have been set by agreement rules). One of the most common uses of a transformation rule is to rearrange the order of the text, and therefore change which words are adjacent to each other.

It is possible to work around these restrictions to a limited extent by working backwards from the surface forms that are of interest and determining which intermediate forms could produce them, but this approach is both tedious and deeply unsatisfactory. If mutation (or any other phenomenon) occurs because words are adjacent (as opposed to having a particular structural relationship) then that is how the corresponding rules should be expressed.

I’m therefore satisfied that a new type of rule is justified. It needs to work in much the same way as an agreement rule, but using a pattern which is linear rather than tree-structured. It needs to be applied after any transformations have been completed (so that it can see the final word order) but prior to inflectional rules (so that it is able to influence them).

Agreement rules can already be marked as ‘upward’ or ‘downward’. Since these new rules will be so similar, I think extending this syntax to allow ‘leftward’ or ‘rightward’ is appropriate. The pattern will follow exactly the same syntax as now, but with the constraint that it must have the form of a list rather than an arbitrary tree. For example, the mutation rule described above might be expressed as:

agreement rightward (a $x[+aspirate]);

Longer patterns have additional space-separated terms but no extra parentheses. (Internally these are trees such that the left-hand side is atomic and the right-hand side is another list.)

Unlike upward and downward I doubt it will make any difference whether leftward or rightward rules are applied first, so I am going to arbitrarily say rightward first, leftward second.