Tags in Predicate Definitions

November 1st, 2009

There are several stages of the translation process during which tags can be applied, but currently almost all of the raw information on which they are based is provided by the lexicon. Some of that information is language-independent, and I want to move it into the predicate dictionary so that it can be shared between languages. Examples of the type of information I have in mind are the fact that a predicate represents:

  • a colour, or
  • a substance, or
  • an animal.

As I’ve previously indicated, language-independence need not be absolute: if a particular language needs to handle a particular predicate differently then it can be overridden. I wouldn’t want to over-use this facility, because attributes which are substantially language-dependent belong in the lexicon, but it will simplify the handling of edge cases and idiosyncrasies.

The syntax for attaching tags to a predicate is as follows:

predicate foo
{
  tag bar,baz,quux;
};

Multiple tag statements are permitted as an alternative to listing them on one line. Tags are applied as the very first stage of the translation process. I’ve also made some changes to later stages so that tags survive up to and beyond word selection.

What can’t easily be achieved at this point is tagging of composite predicates prior to word selection, so while the system can be told that zoo:genus:vulpes has the characteristic of being animate, it is not able to deduce that (colour:red zoo:genus:vulpes) is equally animate. This is an issue I intend to address soon.

Fallbacks

October 22nd, 2009

One of the main challenges that the translation system must address is how to behave when there is no word in the target language to express one of the concepts requested. I intend to address this by providing fallback translations which are available across all languages. There will be two types of fallback:

  • Fallbacks to a description. For example, an ‘aeroplane’ might be described as a ‘flying machine’, or a ‘desk’ as a ‘writing table’. These are not perfect replacements, but in a language with limited vocabulary there may be no better way to convey the required meaning.
  • Fallbacks to a reading. Sometimes it is better to borrow a foreign word than to attempt any form of translation. Good examples of this would be animal names such as ‘kangaroo’ or ‘meerkat’ (which made their way into English through this mechanism). Any useful description would be unreasonably long, and falling back to ‘marsupial’ or ‘mongoose’ is unlikely to be helpful.

It is also worth mentioning one other type of construct which, though not strictly a fallback, has similar behaviour in that it can be translated without the need for an explicit reading:

  • Aliases. These are predicates which can be exactly expressed in terms of other, more primitive predicates. The alias exists only as a convenience for those writing source texts and language descriptions. Any analysis occurs after decomposition into primitives.

The natural place for descriptive fallbacks to be specified is in the corresponding predicate definition. (One consequence of this is that descriptive fallbacks will only be able to replace atomic predicates, not compounds. I think this is a reasonable restriction.) I’ve chosen the following syntax:

predicate vehicle:aeroplane
{
  fallback ((for-purpose-of flying) machine);
};

The intended behaviour is straightforward: if no reading is found then the predicate is replaced by the fallback expression and an attempt made to translate that. I would like to allow multiple fallbacks (to be tried sequentially until one translates successfully) provided that this is not overly difficult to implement.

Originally I had intended to make fallback readings part of the predicate definition too, but eventually decided that there is no need: ordinary readings already provide all of the functionality that is needed:

reading meerkat[noun] = zoo:species:suricata:suricatta;

Should an attempt be made to classify fallback readings into parts of speech (as in the example above), or should they simply be tagged as ‘foreign words’ which are somehow outside the grammar of the target language? My expectation is that most fallback readings will be noun-like, but if there are differences then it must surely be useful for the target language to know about them (and if there are none then ‘foreign word’ is simply an alias for noun).

This will make it necessary for parts of speech used in fallback readings to be standardised across languages, so French would use tags such as ‘noun’ and ‘adjective’ as opposed to ’substantif’ and ‘adjectif’. Fortunately I’ve being doing this anyway (albeit largely for my own convenience rather than as part of any grand plan). The same will apply to other types of tag such as ’singular’ and ‘plural’.

Should there be any attempt to create default inflection rules for fallback readings? I think probably yes. When words are borrowed from one language to another they often retain their original morphology in the first instance. At the very least it can’t do any harm to know which language the fallback was taken from. If a language wants to follow its own rules regardless of this information then it can override the default.

I’m aware that different languages have different parts of speech, and that while most languages have words which pass for nouns, adjectives, verbs and adverbs, that does not mean they have the same semantics or behaviour as English nouns, adjectives, verbs and adverbs. I also appreciate that inflecting completely alien words could prove to be difficult in some languages (although knowing that they are alien should help). However this is about producing something when the alternative is to provide nothing, so perfection is not a requirement.

Should languages omit readings if there is a suitable fallback reading? That’s a tricky question. On the one hand, to say no results in a large amount of duplication within the language description files. This is surely undesirable. On the other hand I do think that lists of differences will be more difficult to write, check and maintain, and that automatic propagation of changes is not desirable in the way that it is between dialects.

A good example of this is chemical elements in Danish. There is very nearly a 50-50 split between native and international names, so any saving would not be of the same order as (for example) British versus US English where there are only three differences. If a full list is given then it is possible to check that every element has been considered, whereas a partial list cannot be checked for completeness without redoing much of the research used to compile it in the first place.

For these reasons I’m minded to stick with full lists for the time being. However when compilation is introduced it would certainly be possible for any redundant readings to be automatically removed by the compiler, and I see no harm in that. Finally, I would only intend to duplicate readings for words which have been or are being assimilated into the language in question. Where a language has no word for a concept, the fallback will not be duplicated.

Dialects (Inheritance)

October 17th, 2009

There are several decisions to be made regarding implementation of the inheritance mechanism. First and most obviously, there needs to be some form of command to indicate that one language is derived from another. I considered two alternatives for the syntax:

  • a Java-style extends keyword, which would form part of a header at the start of the language description, or
  • a Perl-style use statement, which would be placed in the body of the language description.

The problem with a use statement is that the natural expectation would be for it to be allowed anywhere in the language description: later declarations would override, earlier ones would be overridden. This is easy enough to implement: create a master-index (for each declaration type) listing all of the declarations that have not been overridden. The only snag is that I might not want to generate such an index (if optimising for start-up time and/or memory usage), in which case implementation becomes much more fiddly. I could live with this if there was an compelling need for mid-file inclusion, but I’m not convinced there is one.

That leaves the extends keyword. This already exists and is used for specifying inheritance relationships between morphological paradigms, so using it between languages would be a natural extension. It forces the point of inclusion to be at the start of the language description, so avoids the implementation issue described above. The specific syntax I have in mind is of the form:

language "en_US" extends "en";

Should multiple inheritance be supported? I don’t see any reason why not, but I can’t say I have any particular use in mind for it either. On that basis I’ve decided to allow only single inheritance in the first instance, but keep open the option of adding multiple inheritance later if needed.

Having established a hierarchy, it is then necessary to decide in detail how parent and child languages interact with each other. The general rules are that:

  • Everything defined in the parent is imported into the child.
  • Where search order is important (as is the case when objects are referred to by name), the child is searched first.

The specific effect on each type of object is currently as follows:

  • Tags could be made to override earlier definitions, but I can’t think of any good reason why you would want to do this: the effect on the parent language would likely be both profound undesirable. I’ve therefore forbidden tag definitions which conflict with an existing name.
  • Predicate definitions override any previous definition of the same name. This is potentially quite awkward to implement, because they are indexed both by name and by meaning. If the meaning of a predicate is overridden then searches of the latter index do not necessarily yield the correct result. Unlike tags, predicates definitely need to be able to override each other, but their meanings probably don’t. I’ve therefore forbidden predicate definitions which alter the meaning of an existing predicate, but allowed the fallback and any other attributes to be changed.
  • Paradigms and morpheme definitions override any previous definition of the same name. These are indexed only by name, so implementation is straightforward.
  • Grammatical productions are effectively cumulative. The dialect is searched before the parent language, but since there is currently no way to indicate that a production does not occur, it is not possible to override in any meaningful way.
  • Decomposition rules are searched in the same order as productions, but only the first match is applied. This makes overriding is possible in most cases, but because decompositions are applied recursively it requires significant trickery to turn one off entirely. (Replacing the pattern by itself won’t work.)
  • Agreement rules are searched in the same order again, but all matching rules are applied. This makes overriding difficult or impossible.
  • Transformations behave in a similar way to decomposition rules, but since they are not applied recursively to any single point in the text they are somewhat easier to override.
  • Readings don’t override each other within a given language description file, but they do override any readings with the same meaning in any parent language. (Coincidently, this is very similar to how inheritance and overloading interact with each other in C++.) There is some potential for unexpected behaviour, but I think the alternatives would be worse.

Of these I would consider the behaviour of tags, predicates and readings to be satisfactory, but intend to look further at decomposition rules, productions, agreement rules and transformations. I think it is clear that inheritance works best with objects that have names. I’m reluctant to give every rule a name, but grouping them together into named ‘translation phases’ would probably suffice (and bring other benefits too).

Finally, a word about when inheritance should be used. The answer is: whenever it makes good engineering sense to do so. It does not matter whether linguists would classify the difference as one of language or dialect, or the manner in which it evolved historically. The most important considerations are:

  • whether (and to what extent) inheritance reduces duplication, and
  • whether it makes sense for changes to the parent language to propagate into the child.

Update 2009-10-18: I’ve now implemented Brazilian Portuguese and American English using inheritance. In addition to some changed spellings, this includes the American style of writing numbers in hundreds up to 9999, and omitting the ‘and’ in compound numbers. I’m not completely convinced regarding the last of these points: some speakers are quite adamant that use of ‘and’ is wrong, but evidence of usage in reasonably formal contexts is not so conclusive. I’d welcome comment as to whether I’ve got this right or wrong.

Dialects

October 14th, 2009

Currently the translation system supports only one dialect of any given language, so English means British English and Portuguese means European Portuguese. The United States and Brazilian dialects of these languages are sufficiently different (and popular) that they certainly ought to be supported too, but similar enough that writing entirely separate language descriptions would result in an undesirable amount of duplication.

What is needed is a mechanism which allows language description files to share common data. This would bring two benefits:

  • a reduction the amount of memory and disc space consumed;
  • automatic propagation of any corrections or improvements to a language to its dialects.

There are two ways in which sharing could be achieved:

  • by merging related dialects into a single language description file, then switching sections of that file in and out using some form of conditional notation; or
  • by requiring a separate language description file for each dialect, but allowing inheritance relationships between dialects such that only the differences need be specified.

Drawbacks of the first method are reduced modularity and readability. All dialects of a language would have to be loaded together, even if only one were needed. The language descriptions are likely to be quite complex enough handling one dialect, and if anything I would prefer to be looking at ways to break them down into smaller units rather than making them larger.

The main drawback of the second method is that it scales very poorly if the language can vary in several dimensions independently. For example, in Celtic languages the use of decimal versus vigesimal numbers is only loosely correlated with dialect and to a large extent is a matter of personal choice. You could write two language description files for each language, one for decimal and one for vigesimal, but then what happens when another issue is found where a similar choice is needed?

For these reasons I don’t think that either method provides a complete solution, so am inclined to implement both. This is not an unreasonable extravagance: many programming languages provide comparable facilities (such as #ifdef and #include in the C preprocessor).

Broadly speaking my intention is to use inheritance for regional dialects, and conditional rules for preferences which cut across those dialects. Inheritance is in the process of being implemented, and I will describe the syntax shortly. Conditional rules I don’t have a clear strategy for yet, but they are a less urgent requirement.

Merging the predicate dictionary into the language description

September 30th, 2009

I’m going to eliminate any formal distinction between the predicate dictionary and the language description. Instead there will be only one type of file, which can hold any of the currently supported types of declaration (predicate, morpheme, reading and so on).

Of course, I certainly don’t want to copy and paste the same set of predicate declarations into many separate files, but that won’t be necessary. There will need to be a way for one language to inherit declarations from another language in order to efficiently support dialects. The same mechanism, once it has been implemented, can be used across all languages to share a common set of predicate declarations.

One important difference between this and the current arrangement is that it will provide a basis for predicate declarations to be overridden. This will allow information to be associated with a predicate even if it is not strictly language-independent.

For example, Polish treat nouns differently according to whether they are animate or inanimate. For the most part animacy is defined as you would expect it to be, but there are marginal cases which are decided by convention (plants are generally inanimate, but viruses, bacteria and fungi are animate), and more than a few outright exceptions (units of currency, such as the złoty, are animate). To the extent that the classification is based on objective criteria it can and should be shared between languages, but exceptions rightly belong within the relevant language description.

Implementing this capability will not be a great burden. Arguably it simplifies the translation system slightly, and it avoids the annoyance (within the internal C++ API) of having to explicitly instantiate the predicate dictionary and provide a reference to it when constructing a language object.

Licensing

September 27th, 2009

The translation system is first and foremost an Open Source project, so it needs to be released under at least one OSD-compatible licence. Broadly speaking there are three classes of licence from which to choose:

  • fully reciprocal licences (such as the GPL),
  • reciprocal licences with a library exception (such as the LGPL), and
  • non-reciprocal licences (such as the modified BSD licence).

My preference is for a fully-reciprocal licence because that encourages other developers to give back to the community by releasing their own programs as Open Source, and the most widely used licence which does this is the GPL. However, I also want the translation system to be used as widely as possible, and imposition of the GPL by itself would prevent such use:

  • by Open Source projects that have chosen a licence that is incompatible with the GPL, and
  • by proprietary vendors who either cannot or choose not to release under the terms of an Open Source licence.

For this reason I want to retain the ability to grant further licences, including fully commercial ones if the recipient is not giving back in some other way. Currently that is straightforward because the code is entirely written by me, but the situation becomes more complicated if third-party contributions are added to the codebase.

In the case of the library itself I don’t foresee any substantial third-party contributions being needed, but for the language definition files they will be essential if the project is to be a success. There are several ways to preserve my ability to issue new types of licence:

  • by assigning copyright to me;
  • by giving me the right to sublicense on terms of my choosing; or
  • by licensing contributions permissively enough to coexist with any other licence I might want or need use.

Sun use a combination of the first two methods for contributions to Java and MySQL, and there are other companies with similar arrangements, but they are sometimes criticised for being overly one-sided. I wouldn’t personally object to contributing on such terms, but can see why others might.

I won’t claim that the third method is completely fair — dual-licensing schemes generally aren’t — but it is at least fairer than the other two. The protection it provides is weaker, because not all of the code is covered by the GPL, but still much better than placing everything under the BSD licence or the LGPL. I think it is an effective and defensible solution for the current situation where the project is first and foremost my work. (If others start making contributions of comparable size and value then I’m willing to negotiate.)

I’ve not settled on a precise form of words yet, but it will most likely be similar to the copyright disclaimer used by the GNU Project.

Names of Colours part 4: Implementation

August 31st, 2009

I’ve now had some experience working with the colour predicates described previously, and so far they have proved to be satisfactory. Certainly I have not yet had cause to wish that they were defined differently. However there are a number of constructions which cannot currently be translated, due in large part to how the word selection algorithm works. Here is an outline of what works, what doesn’t, and how the situation could be improved.

Unqualified hues present no difficulty provided that the readings given cover all of the allowed predicates. This can and often does result in several readings for the same colour name. For example, both colour:azure and colour:blue have been translated as ‘blue’ in English.

Hues qualified with colour:dark, colour:light or colour:bright also work as intended provided that they are bound together as a compound predicate, for example:

(colour:dark colour:orange) ⇒ ‘brown’

With a few additions to the language description this can be expanded to:

((colour:dark colour:orange) bio:genus:vulpes) ⇒ ‘brown fox’.

However, other permutations of these predicates have a less desirable surface form:

((colour:orange colour:dark) bio:genus:vulpes) ⇒ ‘orange dark fox’
(colour:dark (colour:orange bio:genus:vulpes)) ⇒ ‘dark orange fox’
(colour:orange (colour:dark bio:genus:vulpes)) ⇒ ‘orange dark fox’

The question is, should the translation system do better with these inputs, or should these inputs be avoided?

A partial answer to this question is that it shouldn’t matter whether colour:dark or colour:orange is specified first, because the effect of these predicates on the membership function is linear. By this I mean that if f(x) represents darkness and g(x) represents orangeness then:

f(g(x)) ∝ f(x) × g(x)

Since multiplication is commutative it follows that:

f(g(x)) ∝ g(f(x))

This does not mean that (colour:dark colour:orange) and (colour:orange colour:dark) should necessarily produce the same output, but the surface forms should at least be of similar quality, which is clearly not the case at present.

One solution would be to provide two readings for the word ‘brown’, but this would be inelegant, and scales poorly if more than two predicates were involved. The alternatives are to improve the word selection algorithm so as to recognise when permutations are equivalent to each other, or to force the predicates into a particular canonical order.

Many languages have a preferred order for adjectives, so some reordering will be needed whether or not it is a requirement for word selection. For those languages which don’t have a preferred order, there is no reason why one can’t be imposed anyway. Even for those languages which use adjective order to indicate emphasis, there is no need to preserve the original order of the predicates, because that would not be a correct way to deduce what should be emphasised.

However I can see one situation where reordering won’t help. Where a noun encompasses the meaning of one or more adjectives (such as ‘lamb’ or ‘ewe’ in place of ’sheep’) there is no guarantee that the predicates replaced will be canonically adjacent to each other (for example ‘young black sheep’). For this reason I think impovements to the word selection algorithm will be needed, even if canonicalisation is introduced too.

Regarding the question of associativity, (colour:dark (colour:orange bio:genus:vulpes)) is certainly acceptable: it merely applies the three predicates in sequence. As they are all descriptive and do not contradict each other there is no reason why this shouldn’t happen, but it is not something that the translation system can handle currently.

The colour in isolation, however, must be expressed as (colour:dark colour:orange) (or vice versa), because there is no other way in which two predicates can be combined. It follows that all of these forms need to be matched. The options are similar to before, except that there will almost certainly need to be changes to word selection (because the current system cannot replace a set of predicates which do not form a subtree).

It has occurred to me that I may be making life unnecessarily difficult for myself by using an explicit binary tree structure as opposed to something more akin to the list structures used in Lisp. In the latter case there is a terminator at the end of the list, so there is no structural difference when colour:dark and colour:orange are applied to each other or applied to something else. This would be a radical change that would affect the whole translation system, but I think it is worth considering.

Hedges

August 9th, 2009

In fuzzy logic, a ‘hedge’ is a type of unary operator which acts on the membership function of a fuzzy set. Hedges typically correspond in meaning to adverbs such as ‘very’ and ’slightly’. Effects may include displacing the peak of the membership function, or causing it to become more concentrated or spread out.

Many predicates need to have fuzzy semantics if they are to accurately represent their linguistic counterparts. For example, colour:bright is fully true for colours with a saturation and value of 100%, but would still be mostly true if the value were lowered to 90%. The way I envisage the membership function to be defined, it would become completely false only on reaching the grey line. Similar considerations apply to colour:pale and colour:dark.

Because there are degrees of truth to these predicates it is possible to talk of colours which are (for example) ‘very bright’ or ’slightly dark’. All that is needed are predicates to represent the required hedges. Here are the ones I intend to make available in the first instance:

  • not[1] inverts the membership function, so that members become non-members and vice versa. It is normally implemented by the function f(x) = 1−x.
  • very concentrates the membership function, peaking at the highest possible values. It is often implemented by the function f(x) = x2 but there are other ways to define it.
  • slightly concentrates the membership function, peaking at what would otherwise be the lowest possible values. A possible definition is (very not).
  • moderately concentrates the membership function, peaking at what would otherwise be middling values. A possible definition is the union of (not slightly) and (not very).

Hedges are sufficiently fundamental that I see no objection to their being placed in the global namespace.

Given that slightly and moderately could be defined in terms of other predicates it may be asked whether they are necessary. I’ve included them for three reasons: in the interests of brevity, to avoid committing to any particular definition of the membership function, and because I’ve not seen any evidence that decomposing these hedges would be of value linguistically.

On the other hand, I do intend to make use of constructs such as (not very) and (very slightly) because equivalent forms are used in many languages.

I’ve not (yet) defined a somewhat hedge because I’m not convinced that the way this term is used in fuzzy logic (typically implemented by means of the square root function) corresponds to ’somewhat’ in English, or indeed to any word that I have been able to identify. To my mind, if there were to be a somewhat hedge then it should peak between slightly and moderately.

A potentially useful benefit of defining very to mean f(x) = x2 is that it allows ((very dark) orange) to be re-expressed as (dark dark orange), which could then be translated as ‘dark brown’. I’m not going to commit to saying that very is defined this way, because I want to avoid that level of detail. I will say that it is an acceptable approximation, which is all the translation system needs to know in order to make use of it (since translation is not an exact process).

There will probably need to be restrictions on when and where these predicates can be used, and in what combinations. For example, whereas (very dark) makes sense and ought, (very (cardinal 5)) does not (because being fivefold is not an attribute of degree). I’ll return to this subject when I have more experience of implementation.

[1] Some sources class not as a hedge, others as part of a larger class of functions called modifiers.

Names of Colours part 3: Darkness, Lightness and Brightness

July 17th, 2009

The corners of the triangle that I described previously were black, white, and the saturated hue. The areas around those corners can be described by the adjectives ‘dark’, ‘pale’ (or ‘light’) and ‘bright’. My initial attempt at a system of predicates to describe regions of this triangle will be based on those adjectives:

  • colour:dark
  • colour:pale
  • colour:bright

These can be combined with each other, and also with hedge operators such as ‘very’ and ’slightly’. I’ve chosen the word ‘pale’ rather than ‘light’ because being both ‘dark’ and ‘light’ sounds like a contradiction in terms, whereas ‘dark’ and ‘pale’ suggests the desired meaning of ‘greyish’. As always, the choice of predicate name does not necessarily indicate how the concept should be translated. I’ve not yet decided what hedges to provide or how to define them, but will do soon.

There is a complication, in that the triangle is not symmetrical. Consider shades of blue. Whereas the bright corner represents the best example of ‘bright blue’, the dark corner does not qualify at all as ‘dark blue’ because it is actually pure black. Similarly, the pale corner is pure white. Put another way, the unqualified colour ‘blue’ doesn’t refer equally to all parts of the triangle. Instead it is most true for the region around to the bright corner, and not true at all on the grey edge that connects the other two corners.

For this reason I think it is necessary to define the hue predicates as if they had a built-in qualification of colour:bright. That gives approximately the right semantics for representing unqualified basic colours such as ‘red’, ‘yellow’, ‘green’ and ‘blue’. Further desirable effects are that:

  • Explicit qualification using colour:bright would reinforce the implicit qualification, placing the colour very close to the bright corner, whereas qualification using colour:dark or colour:pale would pull it only half-way down the triangle.
  • Qualification using one of colour:dark or colour:pale will cause the colour to peak at the edge of the triangle, as opposed to a band crossing the interior.
  • However dark or pale a colour is made, it cannot reach the grey edge of the triangle.

This raises the question of how you do represent colours along the grey edge of the triangle. As hue is mathematically indeterminate along that line I have no problem with treating these colours as special cases:

  • colour:black
  • colour:white
  • colour:grey

I’m leaving it open whether colour:black and colour:white are shorthand for extreme shades of grey, or whether they represent distinct concepts. Either way, they are important enough that they should not need to be explicitly synthesised.

That, I think, is as much as I want to specify at present. I don’t intend to tie the predicates to an absolute colour space (such as sRGB), partly because I haven’t been able to formulate any defensible basis for selecting one, but mainly because I’m not convinced that the colour terms typically used in natural languages imply any particular colour space. Similarly, I’ve avoided making any distinction between transmitted and reflected light (although there will be a need to address concepts such as transparency and reflectivity at some point in the future).

Names of Colours part 2: Hues

July 15th, 2009

Having established that one of the components of a colour should be its hue, it is necessary to decide what hues to provide, at what level or levels of precision, and how to name them.

In the first instance I intend to provide predicates for hues at 30° intervals and with the following names:

  • colour:red (0°)
  • colour:orange (30°)
  • colour:yellow (60°)
  • colour:chartreuse (90°)
  • colour:green (120°)
  • colour:spring-green (150°)
  • colour:cyan (180°)
  • colour:azure (210°)
  • colour:blue (240°)
  • colour:violet (270°)
  • colour:magenta (300°)
  • colour:rose (330°)

This ought to provide sufficient resolution for the colours that are commonly used in most natural languages. It won’t be sufficient for very specific colour names like ‘burnt umber’ or ‘cerulean blue’, but I have my doubts as to whether it is feasible to translate those at all. I’m open to the possibility of narrowing down to 15° or less if that would allow ordinary colours to be translated more accurately, but would want to see evidence.

I want to stress that just because a particular colour name has been used for a predicate does not mean that the same name will be used when it is translated into English, nor that languages need to provide distinct readings for all twelve hues. For example, my expectation is that the default English translations for colour:azure and colour:blue will both be ‘blue’.

It will be desirable to have some method available for expressing precision because there are some situations where you do need to distinguish between ‘azure’ and ‘blue’, however:

  1. in the case of colours this can and probably should be done by qualifying the predicates listed above in some way (not by creating a separate set of low-precision predicates); and
  2. rather than being specific to colours, it would be preferable for the method to be a generic one.

In view of the latter point I’m going to defer the issue of precision and treat it as a separate topic. In the meantime, any readings that are created will assume a ‘basic’ level of precision (meaning the level that would typically be used in the absence of any reason to be more specific).