Sunday, May 28, 2006

About splitter and lumpers

When you are working on the creation of a resource like WiktionaryZ, then one of the biggest issues is the theoretically large amount of different concepts or better DefinedMeanings. There are people (the splitters) who are of the opinion that "clear" distinctions have to be made between the different ways an Expression (word or phrase) is to be understood, another group (the lumpers) are of the opinion that it is "clear" that such a fine grained set of definitions are more confusing than helpful and therefore a reduced set of DefinedMeanings are in order.

This is a recipe for many quarrels; there is an obvious need for an objective way out.

At the LREC 2006 conference, Ed Hovy gave a keynote speech that I really enjoyed and that has in my mind implications on how we can resolve these questions. The key thing is that when a word or phrase has different senses, it should be possible to have a group identify these senses in a corpus and achieve at least 90% agreement. One really nice side effect is, that it proved empirically that typically less definitions proves to achieve better results.

When you have a more limited set of DefinedMeanings, there are also Definitions that are part of either a domain or a resource that has its place in WiktionaryZ. My solution would be to allow these as secondary definitions. They are their to show the different ways a concept is understood and also they provide a link to the resources that are imported in WiktionaryZ.

One thing we will have to experience is how this works out when we consider the multi lingual aspect of WiktionaryZ.. I am fairly confident that by adopting this approach it will help us to reduce the number of instances where the "Identical meaning" flag is turned off.



Post a Comment

Links to this post:

Create a Link

<< Home