Not logged in
Project Babel
Introduction
This is the home page of Project Babel, the primary aim of which is to develop an Open Source machine translation system which can automatically but accurately generate text in many different languages. It is hoped that this will reduce the amount of effort needed to create localised versions of user interfaces, documentation and web pages, and thereby increase the number of languages that it is practicable to support.
Strategy
Accurate translation from one natural language to another is a non-trivial problem because of the ambiguities which must be resolved. For this reason, an artificial language is being developed which is sufficiently unambiguous to avoid such difficulties. In the short term it is intended that all source text will be written using this language; in the longer term it may be feasible to support some form of hinted natural language.
For each supported target language there will be a dictionary containing readings for the available words. There will also be readings for phrases, to allow for cases where the meaning (or connotation) differs from what the individual words would suggest. Each reading will be tagged with the parts of speech that it is able to provide. Dialects will be supported by allowing one language description to be derived from another, so that only the differences need be specified.

It is provisionally intended that the translation process will consist of the following phases:

  1. listing the parts of speech which could be used, in isolation, to represent each fragment of source text;
  2. choosing a specific part of speech to represent each fragment of source text, such that the result can be formed into sentences.
  3. choosing the word or phrase which will be used to represent each fragment of source text;
  4. distribute information such as person, number and gender that is needed to ensure agreement;
  5. perform any transformations needed to conform to the grammar of the target language;
  6. perform any necessary morphological transformations on individual words and phrases;
  7. apply any ad hoc rules which are closely coupled to the surface structure, and which cannot conveniently be implemented at an earlier stage.
Status
The project is currently at an early stage of development and it will be some time before any generally usable software is released. However it is now entering a phase where review and criticism by others would be extremely valuable, and for this reason it makes sense to switch to a more open development process.
Currently the most mature component of the project is the library, which is able to read language definitions and implement many of the basic operations which will be needed to perform translations. It has been tested successfully with small vocabularies and grammars, but will require further work (particularly with regard to word choice and agreement) before it is capable of performing useful real-world tasks.
Language definition files are now in development for nearly two dozen languages. Currently these are of very limited scope - in most cases only the cardinal numbers have been implemented so far - however it is hoped that they can be gradually expanded to provide greater coverage. Please refer to the development blog for more detailed information about progress.
Further Information

There is a mailing list for announcements and discussion about this project. To subscribe, send an empty message to:

babel-request@lists.riscpkg.org

with a subject of “subscribe”.

The source code for the C++ library can be found in the source code repository of the RISC OS Packaging Project. Limited documentation for this library can be found in the header file and is suitable for processing using Doxygen.
Please note that the status of the lexical database is currently under review, and there is a possibility that it will be discontinued. For current language definition files please refer to the Subversion repository.
There is a need for further documentation and it will be published as and when time permits.