Dialects

Currently the translation system supports only one dialect of any given language, so English means British English and Portuguese means European Portuguese. The United States and Brazilian dialects of these languages are sufficiently different (and popular) that they certainly ought to be supported too, but similar enough that writing entirely separate language descriptions would result in an undesirable amount of duplication.

What is needed is a mechanism which allows language description files to share common data. This would bring two benefits:

  • a reduction the amount of memory and disc space consumed;
  • automatic propagation of any corrections or improvements to a language to its dialects.

There are two ways in which sharing could be achieved:

  • by merging related dialects into a single language description file, then switching sections of that file in and out using some form of conditional notation; or
  • by requiring a separate language description file for each dialect, but allowing inheritance relationships between dialects such that only the differences need be specified.

Drawbacks of the first method are reduced modularity and readability. All dialects of a language would have to be loaded together, even if only one were needed. The language descriptions are likely to be quite complex enough handling one dialect, and if anything I would prefer to be looking at ways to break them down into smaller units rather than making them larger.

The main drawback of the second method is that it scales very poorly if the language can vary in several dimensions independently. For example, in Celtic languages the use of decimal versus vigesimal numbers is only loosely correlated with dialect and to a large extent is a matter of personal choice. You could write two language description files for each language, one for decimal and one for vigesimal, but then what happens when another issue is found where a similar choice is needed?

For these reasons I don’t think that either method provides a complete solution, so am inclined to implement both. This is not an unreasonable extravagance: many programming languages provide comparable facilities (such as #ifdef and #include in the C preprocessor).

Broadly speaking my intention is to use inheritance for regional dialects, and conditional rules for preferences which cut across those dialects. Inheritance is in the process of being implemented, and I will describe the syntax shortly. Conditional rules I don’t have a clear strategy for yet, but they are a less urgent requirement.

Leave a Reply

You must be logged in to post a comment.