Archive for September, 2009

Merging the predicate dictionary into the language description

Wednesday, September 30th, 2009

I’m going to eliminate any formal distinction between the predicate dictionary and the language description. Instead there will be only one type of file, which can hold any of the currently supported types of declaration (predicate, morpheme, reading and so on).

Of course, I certainly don’t want to copy and paste the same set of predicate declarations into many separate files, but that won’t be necessary. There will need to be a way for one language to inherit declarations from another language in order to efficiently support dialects. The same mechanism, once it has been implemented, can be used across all languages to share a common set of predicate declarations.

One important difference between this and the current arrangement is that it will provide a basis for predicate declarations to be overridden. This will allow information to be associated with a predicate even if it is not strictly language-independent.

For example, Polish treat nouns differently according to whether they are animate or inanimate. For the most part animacy is defined as you would expect it to be, but there are marginal cases which are decided by convention (plants are generally inanimate, but viruses, bacteria and fungi are animate), and more than a few outright exceptions (units of currency, such as the złoty, are animate). To the extent that the classification is based on objective criteria it can and should be shared between languages, but exceptions rightly belong within the relevant language description.

Implementing this capability will not be a great burden. Arguably it simplifies the translation system slightly, and it avoids the annoyance (within the internal C++ API) of having to explicitly instantiate the predicate dictionary and provide a reference to it when constructing a language object.

Licensing

Sunday, September 27th, 2009

The translation system is first and foremost an Open Source project, so it needs to be released under at least one OSD-compatible licence. Broadly speaking there are three classes of licence from which to choose:

  • fully reciprocal licences (such as the GPL),
  • reciprocal licences with a library exception (such as the LGPL), and
  • non-reciprocal licences (such as the modified BSD licence).

My preference is for a fully-reciprocal licence because that encourages other developers to give back to the community by releasing their own programs as Open Source, and the most widely used licence which does this is the GPL. However, I also want the translation system to be used as widely as possible, and imposition of the GPL by itself would prevent such use:

  • by Open Source projects that have chosen a licence that is incompatible with the GPL, and
  • by proprietary vendors who either cannot or choose not to release under the terms of an Open Source licence.

For this reason I want to retain the ability to grant further licences, including fully commercial ones if the recipient is not giving back in some other way. Currently that is straightforward because the code is entirely written by me, but the situation becomes more complicated if third-party contributions are added to the codebase.

In the case of the library itself I don’t foresee any substantial third-party contributions being needed, but for the language definition files they will be essential if the project is to be a success. There are several ways to preserve my ability to issue new types of licence:

  • by assigning copyright to me;
  • by giving me the right to sublicense on terms of my choosing; or
  • by licensing contributions permissively enough to coexist with any other licence I might want or need use.

Sun use a combination of the first two methods for contributions to Java and MySQL, and there are other companies with similar arrangements, but they are sometimes criticised for being overly one-sided. I wouldn’t personally object to contributing on such terms, but can see why others might.

I won’t claim that the third method is completely fair — dual-licensing schemes generally aren’t — but it is at least fairer than the other two. The protection it provides is weaker, because not all of the code is covered by the GPL, but still much better than placing everything under the BSD licence or the LGPL. I think it is an effective and defensible solution for the current situation where the project is first and foremost my work. (If others start making contributions of comparable size and value then I’m willing to negotiate.)

I’ve not settled on a precise form of words yet, but it will most likely be similar to the copyright disclaimer used by the GNU Project.