Bi-directional agreement rules
Thursday, February 26th, 2009Agreement rules are applied recursively, currently starting at the leaf nodes and working upwards towards the root. As I mentioned at the time they were introduced, this allows indirect propagation to occur upwards but not downwards. (Direct propagation is what can be achieved by applying a single rule once; indirect propagation requires two or more such steps.) So far this has not been a problem, but I think it’s almost certain to become one.
The solution I was toying with was to apply the agreement rules repeatedly until a stable solution was obtained. On reflection, I have two concerns about this idea. The first and very obvious one is that there might not be a stable solution. Worse, it would be very difficult to check in advance whether a given set of rules could fail to give a stable solution - certainly it would not be obvious by inspection. Secondly, there are useful techniques which work when rules are executed once but not when they are executed twice or more (and especially if you don’t know in advance how many times or in what order).
An example that relies on once-only execution is setting a tag at level n on condition A then clearing it at leven n-1 on condition B. This implements the condition ‘A and not B’, but with the qualification that if level n is the root (so n-1 does not exist) then it reduces to condition A. I’ve yet to find any other way of achieving this effect that works in all cases.
For this reason, I think a compromise is in order. Rather than allow an arbitrary number of iterations, my intention is to perform two passes: one upwards and one downwards. My reasoning is that this is (minimally) sufficient to allow information to propagate from any point in the tree to any other point in the tree. (Note that would not be so if the downward pass were formed first, so the order is important.)
So that rules are applied once only, they will be marked to indicate whether they refer to the upward or downward pass. The syntax will be:
agreement upward phrase;
agreement downward phrase;
For order of application I see three options:
- Leave the order undefined, as at present.
- Leave the order undefined, but require that the result not be dependent upon it.
- Apply all rules simultaneously to any given node, and require that they not conflict with each other.
- Define the order in which rules are applied.
The difficulty with option 1 is that it provides no assurance that language definition files are portable from one implementation (or version) of the translation system to another. Options 2 and 3 ensure portability, but only if the specified restrictions are enforced, and it turns out that option 3 is rather easier to enforce than option 2. Option 4 would ensure portability too, and in some respects would be more powerful than option 3, but I think the rulesets would be less readable, and I’ve not (yet) seen any evidence that a more powerful syntax is needed.
For these reasons I’m inclined to favour option 3. The method for detecting conflicts will be to iterate through all rules making note of how the tags attached to each node will change, but not actually making those changes immediately. Each node is then re-examined and the changes executed. A conflict is raised if any tag is required to be both added to and removed from a node. As a further refinement it would be helpful to mark nodes during the first phase that will need to be re-visited during the second phase, in order to avoid having to re-visit every node.
[Update 2009-02-27: taking this policy to its logical conclusion, implementation-dependent behaviour is something which ought to be eliminated across the whole of the translation system so far as is possible. This probably won't always be done by applying rules in parallel - for example, the inflection system already has a mechanism for specifying the order in which rules are applied - but I would expect for the most part that the same set of options will be available.]