Names of Colours part 4: Implementation
Monday, August 31st, 2009I’ve now had some experience working with the colour predicates described previously, and so far they have proved to be satisfactory. Certainly I have not yet had cause to wish that they were defined differently. However there are a number of constructions which cannot currently be translated, due in large part to how the word selection algorithm works. Here is an outline of what works, what doesn’t, and how the situation could be improved.
Unqualified hues present no difficulty provided that the readings given cover all of the allowed predicates. This can and often does result in several readings for the same colour name. For example, both colour:azure and colour:blue have been translated as ‘blue’ in English.
Hues qualified with colour:dark, colour:light or colour:bright also work as intended provided that they are bound together as a compound predicate, for example:
(colour:dark colour:orange) ⇒ ‘brown’
With a few additions to the language description this can be expanded to:
((colour:dark colour:orange) bio:genus:vulpes) ⇒ ‘brown fox’.
However, other permutations of these predicates have a less desirable surface form:
((colour:orange colour:dark) bio:genus:vulpes) ⇒ ‘orange dark fox’(colour:dark (colour:orange bio:genus:vulpes)) ⇒ ‘dark orange fox’(colour:orange (colour:dark bio:genus:vulpes)) ⇒ ‘orange dark fox’
The question is, should the translation system do better with these inputs, or should these inputs be avoided?
A partial answer to this question is that it shouldn’t matter whether colour:dark or colour:orange is specified first, because the effect of these predicates on the membership function is linear. By this I mean that if f(x) represents darkness and g(x) represents orangeness then:
Since multiplication is commutative it follows that:
This does not mean that (colour:dark colour:orange) and (colour:orange colour:dark) should necessarily produce the same output, but the surface forms should at least be of similar quality, which is clearly not the case at present.
One solution would be to provide two readings for the word ‘brown’, but this would be inelegant, and scales poorly if more than two predicates were involved. The alternatives are to improve the word selection algorithm so as to recognise when permutations are equivalent to each other, or to force the predicates into a particular canonical order.
Many languages have a preferred order for adjectives, so some reordering will be needed whether or not it is a requirement for word selection. For those languages which don’t have a preferred order, there is no reason why one can’t be imposed anyway. Even for those languages which use adjective order to indicate emphasis, there is no need to preserve the original order of the predicates, because that would not be a correct way to deduce what should be emphasised.
However I can see one situation where reordering won’t help. Where a noun encompasses the meaning of one or more adjectives (such as ‘lamb’ or ‘ewe’ in place of ’sheep’) there is no guarantee that the predicates replaced will be canonically adjacent to each other (for example ‘young black sheep’). For this reason I think impovements to the word selection algorithm will be needed, even if canonicalisation is introduced too.
Regarding the question of associativity, (colour:dark (colour:orange bio:genus:vulpes)) is certainly acceptable: it merely applies the three predicates in sequence. As they are all descriptive and do not contradict each other there is no reason why this shouldn’t happen, but it is not something that the translation system can handle currently.
The colour in isolation, however, must be expressed as (colour:dark colour:orange) (or vice versa), because there is no other way in which two predicates can be combined. It follows that all of these forms need to be matched. The options are similar to before, except that there will almost certainly need to be changes to word selection (because the current system cannot replace a set of predicates which do not form a subtree).
It has occurred to me that I may be making life unnecessarily difficult for myself by using an explicit binary tree structure as opposed to something more akin to the list structures used in Lisp. In the latter case there is a terminator at the end of the list, so there is no structural difference when colour:dark and colour:orange are applied to each other or applied to something else. This would be a radical change that would affect the whole translation system, but I think it is worth considering.