Living organisms
Sunday, March 29th, 2009At this point it would be very useful to have some ordinary, countable nouns in the language definition files. One very large category of countable nouns, which I’ve been lining up for some time, is that of living organisms.
My stated policy is to use existing naming standards when defining predicates unless there is a very good reason not to. In this case, the obvious standard to turn to is that of Linnaean taxonomy - the source of what are popularly known as the ’scientific’ or ‘latin’ names of plant and animal species.
In addition to providing unique names for individual species, this serves the important purpose of grouping species into larger categories. It is necessary to this because many common names refer to groups rather than individual species. Examples include ‘rodent’ (the order Rodentia) and ‘insect’ (the class Insecta).
The specific syntax I have in mind is to place everything in the namespace ‘bio’ (biology), then have secondary namespaces for each taxonomic rank. These would be followed by the taxon name itself (making further use of the namespace separator in the case of binomial or trinomial taxa). For example:
bio:species:suricatta:suricatta (meerkat)
bio:ordo:primates (primate)
bio:infraclassis:marsupialia (marsupial)
Note the use of latin names for ranks, non-abbreviated genus names, and all-lowercase script. The result of applying one of these predicates is true to the extent that its argument is composed of instances of the specified taxon. It is not true for subordinate taxa, nor (without appropriate qualification) for parts or derivatives of the relevant organisms (you can make beef from a cow but not vice versa). Quantity, gender and age are unspecified.
The Linnaean system is not a problem-free choice, for several reasons:
- Linnaean names sometimes change in response to new discoveries.
- Biologists do not always on how a species should be named or classified.
- The relationship between Linnaean names and common names can be less than straightforward.
However, in my opinion the alternatives are even less attractive:
- Competing scientific systems tend to map even less neatly onto common names than the Linnaean system, due to their emphasis on genotype (genetic code) rather than phenotype (physical form).
- Common names could be used in the source language, but to maintain any pretense of rigour they would need to be defined somehow - probably using the Linnaean system. Furthermore, while they would undoubtedly translate well into languages which draw their semantic boundaries in the same place, the problems caused by any deviation from this ideal would be magnified (there then being two sets of idiosyncrasies to resolve instead of one).
For these reasons I’m satisfied that the Linnaean system provides the best available basis for the naming scheme, but with two reservations which mean that it will be necessary to use it selectively.
The first is that, unlike chemical elements, it would be neither practicable nor desirable to achieve exhaustive coverage: there are simply too many species out there, not to mention other ranks such as classes, legions, infraorders, superfamilies and subspecies. Secondly, in order to provide sufficient coverage without using an unreasonably large number of names, it will be necessary for the depth of coverage to be non-uniform.
To illustrate why the coverage cannot reasonably be uniform, compare the subspecies Ursus arctos horribilis (grizzly bear) with the phylum Nematoda (nematode worm). Phyla and subspecies are almost at opposite ends of the taxonomic scale, but their corresponding common names are both at the limit of detail that would be expressed in normal speech. (If anything, ‘grizzly bear’ has much more right to be considered a common name than ‘nematode worm’.)
To put this in perspective, the phylum to which grizzly bears belong is Chordata, but that includes all mammals, birds, reptiles, amphibians and fish. Not all of these have distinct common names, but there are several hundred at least which do. For comparison there are between 80,000 and 500,000 species of nematode worm, but as Wikipedia says they are ‘very difficult to distinguish’ and non-specialists generally don’t. If any fixed taxonomic rank were chosen as a cut-off then the system would clearly either fail to provide sufficient detail, or include a huge amount of detail that was not linguistically significant.
Similar issues arise for any fixed set of intermediate ranks. My default policy will be to draw names from what are called the ‘major ranks’ (kingdom, phylum/division, class, order, family, genus and species), but not from minor ranks unless there is a good reason to.
An example of a species for which these ranks map relatively well to common names is Vulpes vulpes:
bio:species:vulpes:vulpes (red fox)
bio:genus:vulpes (fox)
bio:familia:canidae (dog)
bio:ordo:carnivora (carnivore)
bio:classis:mammalia (mammal)
bio:subphylum:vertebrata (vertebrate)
bio:phylum:chordata (chordate)
bio:regnum:animalia (animal)
Even here, some difficulties can be seen. The common name for members of the phylum Chordata is ‘chordate’, but this is not a word that non-specialists would be likely to use in everyday speech. The more common name ‘vertebrate’ corresponds to the sub-phylum Vertebrata, which is a minor rank, but if we were to include all minor ranks then many would not have common names at all (or names sufficiently obscure that they would only be meaningful to biologists).
Because of this selectivity it won’t be possible to simply pick a species or genus name and use it in BabelScript source text. Instead there will need to be a list of names that are allowed (just as there will be for most other types of predicate). That also solves the problem of what happens when species are renamed or reclassified, or when there is disagreement as to what the name or classification should be: so far as the translation system is concerned, the list will be definitive.
I’m undecided what to do about common names which map very poorly onto the hierarchy. An example is the concept of ‘fish’, which at best maps to a group of vertebrates with no more in common with each other than with amphibians, reptiles and mammals. At worst it includes an assortment of crustaceans and other organisms which are entirely unrelated. One option is to resurrect obsolete terms such as the class Pisces, which are no longer used scientifically but which map well to the corresponding linguistic concepts. Another would be to revert to common names in those cases.
Update 2009-07-13: it is apparently permissible to use the same name for both a plant and an animal taxon. That means it will be necessary to further distinguish the predicate names (preferably without adding another level, as they are already quite long enough).