Tuesday, October 31, 2006

What to do when a word is not found

One of the favourite stories of Ronald Beelaard is that many people cannot spell and consequently do not find what they are looking for. The example that he has used often is the word "papegaai", a word many people do not know how to spell.

What Ronald did for the library he was associated with was to write what amounts to an "alternative spelling" function. There are plenty of examples of how this can be done; Google is one of the examples that comes to mind..

For WiktionaryZ, it will not be as straight forward as it is for a monolingual resource. When a word is not found, it is important to know what language this word should be in. Also, WiktionaryZ is lacking many words that you would expect in a resource that want to include all words. This is best demonstrated that of the 1000 most popular words in English, several are still missing..

When a word is not found, it would be cool to have functionality that helps us to understand what the problem is. The first thing would be to aggregate the number of misses. It would be as cool to have the number of hits because it is in the percentages where we will find how well we do.

The words most often not found are the words that are the most relevant to add to WiktionaryZ. This will improve its function as a resource.

Thanks,
GerardM

1 Comments:

Anonymous Anonymous said...

As Gerard already mentioned, my experiences were the following.

Initially 35% of the search requests in the library system failed (bear in mind that generally people who are using this are used to reading books and alike). After analysis I discovered the main cause to be spelling errors. Not just a typo or the spelling of the name of an author (which is very understandable), but also the spelling of nouns when searching on keywords. Most spelling errors were very persistent (repeated within a short time frame).

After the implementation of the improvements, the failure rate dropped to something like 3%.

The algorithms are obviously monolingual. Another famous example is "duif" (= pigeon), which gives "duiven" as plural form. Not only the plural form (adding "en" or "s") is language dependent, but also the change from "f" to "v".

Applying this experience to the wiki environment, leads me to the conclusion that there is much room for improvement here.

Ronald

1:40 pm  

Post a Comment

<< Home