Archive for December, 2008

Ambiguity versus imprecision

Monday, December 22nd, 2008

One of the main design goals of the source language (BabelScript) is to avoid ambiguity, but this does not mean that information must be conveyed in perfect detail. On the contrary, it has become very clear as I’ve started writing predicate definitions that a degree of imprecision is often a necessity.

For example, consider the statement “the sky is blue”. What is the meaning of ‘blue’ in this sentence? The shade known as ’sky blue’ is an obvious possibility, but the writer hasn’t said that, and in reality the colour of the sky varies enourmously dependingly on the time of day, weather, and zenith angle. There is not enough information for the reader to deduce whether the sky is actually sky blue, or medium blue, or some other shade. Significantly, it is quite conceivable that the writer did not know the precise colour (or had not given the matter any thought).

Does this make the statement defective in some way? Not at all. There are details which it does not specify, but that is true of most writing. Indeed, natural language would be considerably less useful if it were not able to convey incomplete or imprecise information: it should be possible to make comment about the colour of the sky without first measuring it with a photometer.

Why, then, all of the fuss about avoiding ambiguity? The concern is to avoid creating predicates that require a deep understanding of the context before they can be translated.

An example is English word ‘revolting’, which can refer either to a cause of revulsion or to an act of rebellion. Individual sentences, such as “the peasants are revolting”, do not necessarily contain enough information to deduce what is intended, but the intent must be known prior to translation because other languages are likely to use different words for the two concepts.

Key differences between ambiguity and imprecision are that:

  1. Ambiguities must be resolved by the reader in order to correctly interpret the text, whereas imprecisions need not be.
  2. Ambiguities typically correspond to large (often qualitative) differences of meaning, imprecisions to smaller (usually quantitative) variations.
  3. Imprecision is a useful tool for conveying incomplete information, whereas ambiguity is useful only where the intent is to pun or dissemble.

If imprecision is considered useful then some means should be found for representing it within the source language. One way in which this can be done is through a class hierarchy, an example being the taxonomic classification of animals and plants. This gives the means to identify an individual species if that information is known (such as V. vulpes, the red fox), but also provides the option of stating only the genus (Vulpes, foxes), the family (Canidae, dogs), the order (Carnivora, carnivores) or the class (Mammalia, mammals). Some species are subdivided into subspecies, allowing even greater precision.

An alternative method is to take a predicate with a more precise meaning, then qualify it in some way to indicate that the precision should be reduced. This is a process which often happens in natural language (hence words such as reddish, smallish, and roundish), and is particularly useful where the imprecise concepts would otherwise be difficult to name.

Whichever method is chosen, the aim should be to approximate a similar level of granularity to that present in typical natural languages. For this reason, it may be appropriate for the degree of granularity to vary between different parts of a hierarchy. (Compare, for example, the phylum Chordata - which includes all mammals, birds, reptiles, amphibians and fish - with the phylum Nematoda - which consists entirely of nematode worms.)

Finally, there will be a need to identify what are known as ‘base level’ concepts. These represent the preferred level of detail if there is no good reason for specifying more or less. Exactly what form this will take has not been decided yet, however my working assumption is that it is a language-dependent phenonenum and therefore belongs in the language definition, not the predicate dictionary.

Predicate semantics: adjectives or abstract nouns?

Tuesday, December 16th, 2008

A topic I’ve touched on before, but which deserves a more thorough explanation, is exactly what semantics are attached to a predicate when it corresponds to a concept such as the colour green, or the metal iron, or the property of being triangular in shape. There are two possibilities that I’ve considered:

  • the predicate is true for anything that is green, or which is composed of iron, or which is triangular in shape;
  • the predicate is true only for the abstract entities of the colour green, or the element iron, or the shape of a triangle.

There is a distinction between the two, most clearly shown in the case of shapes: the difference between saying that “x is triangular” and “x is a triangle”. Since it results in different text there is a clear need to represent this distinction. That is straightforward enough, but the method needed to achieve it depends on the semantics chosen:

  • if a predicate is true for anything green then it can be qualified by a second attribute true for anything that is an abstract colour. The combination is true for anything that is the abstract colour green.
  • if a predicate is true for the abstract colour green then it can be qualified by a second attribute meaning ‘is-coloured’. The combination is true for anything that was coloured green.

I favour the first option for two reasons, one theoretical and one practical. The theoretical consideration is one of orthogonality: a desire to create predicates that are fully independent of each other. The property of being green is orthogonal to the property of being an abstract colour: being one neither implies nor prohibits the other. A predicate corresponding to the abstract colour green fails to separate these concepts. I therefore conclude that the first two are suitable candidates to be represented by atomic predicates, whereas the third is more naturally represented by a compound.

The practical issue is that it is much more common to talk about objects that are green than about the colour green itself. Given the choice, it is preferable for the more frequently used form to be the more concise one. Although this is ultimately just a matter of convenience I would attach significant weight to it: having to refer to a fox that ‘is coloured brown’ and with ‘movement that is quick’ would soon become very tiresome.

Fortunately both arguments lead in the same direction, so I think the decision is an easy one. There may sometimes be a need to explicitly spell out concepts such as ‘coloured green’, in which case a method will need to be found to express that, but my intention is that unqualified predicates such as ‘green’ or ‘brown’ will correspond semantically to adjectives and not to abstract nouns.