Wednesday, November 15, 2006

Why Multilingual MediaWiki matters

You might think that MediaWiki, the software that runs Wikipedia (and WiktionaryZ), is fully multilingual. After all, everyone knows that Wikipedia exists in more than 200 languages -- all of them intricately connected through a network of "interlanguage links" from one edition to the other.

In actual fact, each language edition of Wikipedia runs on a separate database. Wikimedia uses a special setup to share code and configuration files, but that setup is not trivially reproducible. If you want to set up a wiki in multiple languages, your best bet is to set up multiple instances of MediaWiki. As a result, most MediaWiki installations around the world are monolingual. They only accept content in one language.

Of course, users are free to put, say, French language content into an English language wiki. They can even change their user interface language preference to French. But the problems begin quickly:
  • If multiple languages use the same title to refer to a particular page, you need to disambiguate it. In single database setups, this is typically done by appending the language code manually to the page title. These titles can quickly become messy and inconsistent.
  • As soon as activity picks up in multiple languages, the list of changes to the wiki quickly becomes cluttered with information that is useless to readers who do not speak the particular languages in which edits are made.
  • It is impossible to systematically search for pages in a particular language.
  • Pages about the same content in different languages have to be manually connected to each other, which is often done using templates. The wiki does not facilitate the process of interlanguage linking in a single installation. The interlanguage links which are used on Wikipedia are horribly inefficient, as a separate set of language links has to be maintained for each language.
  • The first experience for a user who does not speak the default language of the wiki is often negative. Unless the wiki has been specifically built (with policies and interface messages) to encourage multilingual contributions, they are unlikely even if they are theoretically possible.
Take a look at a big wiki hosting site like Wikia -- even though it sets up multilingual wikis on request (by setting up multiple databases), most of its wikis are monolingual by default. English is of course predominant. With hundreds of millions of Internet users who do not speak English but would be happy to contribute to these wikis, this is a tremendous loss of opportunity. Outside a framework like Wikia, with users setting up their own wikis, it gets even worse: very few people go to the effort of structuring their wikis to accept content in multiple languages.

Wikis have become as ubiquitous as forums or blogs. Whether we are talking about documentation, knowledge bases, directories, discussions, experiments in democracy, media archives -- there are millions of potential participants out there, waiting to be invited to contribute. Waiting to feel welcome. We need to reach out to them. It is not just the community that needs to make the decision to "go multilingual". It is the software that should support this decision as much as possible.

Fortunately, there is an answer: Multilingual MediaWiki. This set of specifications describes the changes to the MediaWiki software needed to accept content in multiple languages, to network it effectively, and to build truly multilingual communities. And fortunately, this is more than just a paper: It is being implemented by a very capable programme, with financial support from the University of Bamberg, and another sponsor who shall remain unnamed for now. You can view the first prototype (still very messy :-), which showcases the functionality to a) store content in language "meta-namespaces" and keep it separate, b) connect pages in different languages to each other.

There's still quite some way to go until this becomes part of MediaWiki proper, but we are making steady progress. When the project is completed, thousands of MediaWiki installations across the planet will gradually become fully capable of accepting content in all languages of the world (if their owners want them to). It will be another step in opening up the world of wikis to the global community.

Beyond its very direct impact on wiki users, MLMW is a requirement for WiktionaryZ. Right now, pages about expressions such as AIDS, which exist in many languages, can become very messy. Ideally, the user would, when looking up an expression, always specify which language it is in -- and then only see the DefinedMeanings in that language. However, for this to work effectively, MediaWiki must support looking up pages in a particular language, exactly the functionality that MLMW will provide.

This is one of many examples in which WiktionaryZ development benefits MediaWiki as a whole. Moreover, it reflects our philosophy to structure our work so that milestones can be reached independently whenever possible. We had some initial problems with the MLMW project -- a funding source ran out, and a developer team became unavailable. Fortunately, this did not impact the main WZ development, and work could continue as soon as we found a new source of funding and a new developer.

I'm not aware of any other wiki engine which handles content in multiple languages well. Hopefully, MediaWiki will become the first.

0 Comments:

Post a Comment

<< Home