Geography: Adjectival Forms
Sunday, April 26th, 2009Many geographical names have a corresponding adjectival form, for example:
- Africa (African)
- Mongolia (Mongolian)
- Cornwall (Cornish)
- Liverpool (Liverpudlian)
The syntactic behaviour of these terms is straightforward but their semantics are not:
- ‘Australian wine’ is wine which originated from Australia (indicating origin).
- ‘Chinese food’ is food of a type which originated from China (indicating origin, but of the type rather than the food itself).
- an ‘American state’ is one which forms part of the United States of America (indicating inalienable possession).
- ‘Russian gold reserves’ are gold reserves owned by Russia (indicating alienable possession).
- the ‘English victory at Agincourt’ was a victory by England (indicating the identity of the agent).
- the ‘French defeat at Waterloo’ was a defeat of France (indicating the identity of the patient).
(I’ve excluded from this list idiomatic usage such as ‘Chinese whispers’ or ‘Spanish practices’ because idioms can only be handled as special cases: it is simply not possible to deduce their meaning analytically [1]. Also excluded is usage referring to the language rather than the location, such as ‘Italian verb’, because the meaning would not then be expressed in terms of the geographical predicate.)
This is not an issue of ambiguity. On the contrary, in any given context the meaning of the adjective is usually well-defined even if (and this is the important point) more than one of the options is physically plausible. This is particularly apparent in the case of nouns like ‘defeat’ where there can be both an agent and a patient. In the examples given above it is perfectly clear as a matter of language who occupies which role: it is not necessary to know military history to work it out.
To make matters even more complicated, the role can depend on more than just the noun. For example, when referring to the ‘English defeat of the Spanish Armada’ it is clear that England is the actor, not the patient. (One way of explaining this effect would be to take the view that ‘English’ is not acting directly on the noun ‘defeat’, but rather on the noun phrase ‘defeat of the Spanish Armada’. Since this is a different concept with different characteristics, the fact that it casts the modifying adjective in a different role is unsurprising.)
I don’t think it is feasible to fully address issues like these in the lexicon. For starters I have no particular desire to list half a dozen separate readings against each adjective. Even if I did, this would not give the correct behaviour because in addition to the allowed usage it would also permit a wide variety of invalid usage.
My tentative solution is therefore to introduce a level of indirection so that only one reading is needed, and so that there is more opportunity for rules to influence the word selection process. (At present the only rules that execute prior to word selection are decomposition rules, but it was always likely that would change.) The specific mechanism I’m proposing is as follows:
- A predicate is introduced for internal use within language description files when specifying the meaning of adjectival forms such as ‘English’ and ‘French’. I’m going to call this
adjective:genitive. - This predicate will not have any fixed meaning, but rather, will represent the difference between the adjectival form and the noun (’English’ vs ‘England’, ‘French’ vs ‘France’).
- Further predicates are introduced to represent more specific relationships, such as alienable possession, in cases where use of the adjectival form would be permissible. I’m going to give these names of the form
adjective:possessive:alienable. - There are no readings for the specific predicates, so generation occurs only by means of fallbacks.
- The first fallback is to
adjective:genitive, so if a suitable adjectival form exists then it is used. - There is a second fallback is to an appropriate preposition (such as ‘of’ or ‘by’) for use when the adjectival form is unsuitable or nonexistent.
This is a fairly complicated arrangement, but having looked at the alternatives I’m satisfied that it is warranted. It will not be possible to implement it until word selection is upgraded to allow fallbacks. One redeeming characteristic is that need not stand in the way adding lexicon entries: these can be defined in terms of adjective:genitive even if there is no supported method for generating it.
[1] You might ask why terms such as these need to be handled at all if the aim is text generation rather than analysis. There is a good reason: to prevent the term from being generated in inappropriate circumstances if its idiomatic meaning would override the natural one. Also, I want the lexicon to be usable in both directions even if the grammar is not, simply because the cost is low and the potential future benefit is large.