Unification can be used to translate from one lexicon to another by means of a ‘transfer dictionary’ of the form:
{[lang1=foo,lang2=bar], [lang1=baz,lang2=qux],…}
I had hoped to use something very similar to perform word selection, for example:
{[class=animal,rank=familia,familia=accipitridae,
category=noun,lexeme=eagle], …}
This correctly selects
[category=noun,lexeme=eagle]
when presented with the input
[class=animal,rank=familia,familia=accipitridae].
Unfortunately it would also select
[category=noun,lexeme=eaglet]
if there were a dictionary entry of the form:
{[class=animal,rank=familia,familia=accipitridae,
maturity=child,category=noun,lexeme=eaglet], …}
because my selection criterion does not specify the required degree of maturity. Similarly, source text that you might expect to select the word ‘vehicle’ would also match ‘hovercraft’ and ‘penny farthing’.
It is possible to circumvent this problem by providing a special symbol ‘none’ that is recognised during the unification process and matches only if its counterpart is unspecified. However, requiring the source text author to use this symbol — possibly several times per FD to exclude different attributes — would be an unreasonable burden, and unification does not (so far as I can tell) provide the means to insert it automatically.
The second issue is how to distinguish between information that:
- need not be expressed, but is provided to assist with word selection, or
- must be expressed, but can be subsumed into another word, or
- must be expressed, as a word in its own right.
For example, suppose there is a requirement to refer to a young, female sheep. In the first case the animal would be described as a ‘lamb’, because that is a more accurate word than ’sheep’ if it is young. The fact that it is female would not be used in English, because (so far as I’m aware) there is no word to express all three concepts together.
In the second case we are saying that ‘young’ and ‘female’ are essential to the meaning of the sentence and must be expressed somehow. The concept ‘young’ is subsumed into the meaning of ‘lamb’, but ‘female’ is not, therefore the output would be ‘female lamb’.
In the third case we are saying that ‘young’ and ‘female’ are so important that they must be expressed as entirely separate words. The output would therefore be ‘young female sheep’. Note that it needs to be a ’sheep’ in this case, because ‘young lamb’ — by encoding the age twice — would imply a very young animal (in much the same was that dark brown is very dark orange).
How to implement this behaviour is a task for another day. My current concern is how to request these different types of behaviour from within the source language. For the current, predicate-based format this is straightforward and I had already reserved two tags for the purpose:
explicit — must be expressed separately
optional — need not be expressed
These would be applicable to both atomic and compound predicates, so (for example) tagging (dark orange) as explicit would prevent the compound as a whole from being merged with anything else, but should not prevent dark and orange from being expressed using the single word ‘brown’.
A further refinement would be to provide shortcuts to allow these tags to be added more concisely: ‘?’ for optional and some other symbol for explicit. This would allow notation of the form:
(young? female? zoo:species:ovis:ares)
Functional descriptions can be qualified in a very similar manner, but individual features can’t be if their values are atomic: to be able to qualify the value of a feature, that value must itself be a functional description. A more concise approach would be to use atomic-valued features for one of the alternatives and separate FDs for the other two:
- information provided solely by atomic-valued features is optional, whereas
- each item of essential information is placed in a separate FD.
I don’t have any objection to this solution in principle: it is very similar to my existing intent to use predicates for data and tags for metadata. However, despite my efforts above, the resulting source text would still be much more verbose than one based on predicates — so much so that I don’t think I could reasonably ask developers to produce source text in that format. For example:
(young female zoo:species:ovis:ares)
might become something like:
[category=np,
age=[class=maturity,maturity=child],
gender=[class=gender,gender=female],
head=[class=animal,rank=genus,genus=ares]]
It might be possible to shorten this by tweaking the syntax, but there seems little prospect of it improving on the conciseness or readability of the existing predicate-based notation. Since they convey essentially the same information, and since this is an external interface to the translation system which has to be easy to use, I think that makes a strong case for not using functional descriptions in the source text.
That doesn’t mean that there is no role for unification later in the translation process, and if there needs to be a conversion between two different formalisms then word selection would be a convenient point at which to do it.