Fallbacks
Thursday, October 22nd, 2009One of the main challenges that the translation system must address is how to behave when there is no word in the target language to express one of the concepts requested. I intend to address this by providing fallback translations which are available across all languages. There will be two types of fallback:
- Fallbacks to a description. For example, an ‘aeroplane’ might be described as a ‘flying machine’, or a ‘desk’ as a ‘writing table’. These are not perfect replacements, but in a language with limited vocabulary there may be no better way to convey the required meaning.
- Fallbacks to a reading. Sometimes it is better to borrow a foreign word than to attempt any form of translation. Good examples of this would be animal names such as ‘kangaroo’ or ‘meerkat’ (which made their way into English through this mechanism). Any useful description would be unreasonably long, and falling back to ‘marsupial’ or ‘mongoose’ is unlikely to be helpful.
It is also worth mentioning one other type of construct which, though not strictly a fallback, has similar behaviour in that it can be translated without the need for an explicit reading:
- Aliases. These are predicates which can be exactly expressed in terms of other, more primitive predicates. The alias exists only as a convenience for those writing source texts and language descriptions. Any analysis occurs after decomposition into primitives.
The natural place for descriptive fallbacks to be specified is in the corresponding predicate definition. (One consequence of this is that descriptive fallbacks will only be able to replace atomic predicates, not compounds. I think this is a reasonable restriction.) I’ve chosen the following syntax:
predicate vehicle:aeroplane
{
fallback ((for-purpose-of flying) machine);
};
The intended behaviour is straightforward: if no reading is found then the predicate is replaced by the fallback expression and an attempt made to translate that. I would like to allow multiple fallbacks (to be tried sequentially until one translates successfully) provided that this is not overly difficult to implement.
Originally I had intended to make fallback readings part of the predicate definition too, but eventually decided that there is no need: ordinary readings already provide all of the functionality that is needed:
reading meerkat[noun] = zoo:species:suricata:suricatta;
Should an attempt be made to classify fallback readings into parts of speech (as in the example above), or should they simply be tagged as ‘foreign words’ which are somehow outside the grammar of the target language? My expectation is that most fallback readings will be noun-like, but if there are differences then it must surely be useful for the target language to know about them (and if there are none then ‘foreign word’ is simply an alias for noun).
This will make it necessary for parts of speech used in fallback readings to be standardised across languages, so French would use tags such as ‘noun’ and ‘adjective’ as opposed to ’substantif’ and ‘adjectif’. Fortunately I’ve being doing this anyway (albeit largely for my own convenience rather than as part of any grand plan). The same will apply to other types of tag such as ’singular’ and ‘plural’.
Should there be any attempt to create default inflection rules for fallback readings? I think probably yes. When words are borrowed from one language to another they often retain their original morphology in the first instance. At the very least it can’t do any harm to know which language the fallback was taken from. If a language wants to follow its own rules regardless of this information then it can override the default.
I’m aware that different languages have different parts of speech, and that while most languages have words which pass for nouns, adjectives, verbs and adverbs, that does not mean they have the same semantics or behaviour as English nouns, adjectives, verbs and adverbs. I also appreciate that inflecting completely alien words could prove to be difficult in some languages (although knowing that they are alien should help). However this is about producing something when the alternative is to provide nothing, so perfection is not a requirement.
Should languages omit readings if there is a suitable fallback reading? That’s a tricky question. On the one hand, to say no results in a large amount of duplication within the language description files. This is surely undesirable. On the other hand I do think that lists of differences will be more difficult to write, check and maintain, and that automatic propagation of changes is not desirable in the way that it is between dialects.
A good example of this is chemical elements in Danish. There is very nearly a 50-50 split between native and international names, so any saving would not be of the same order as (for example) British versus US English where there are only three differences. If a full list is given then it is possible to check that every element has been considered, whereas a partial list cannot be checked for completeness without redoing much of the research used to compile it in the first place.
For these reasons I’m minded to stick with full lists for the time being. However when compilation is introduced it would certainly be possible for any redundant readings to be automatically removed by the compiler, and I see no harm in that. Finally, I would only intend to duplicate readings for words which have been or are being assimilated into the language in question. Where a language has no word for a concept, the fallback will not be duplicated.