Á¤º¸°úÇÐȸ ³í¹®Áö B : ¼ÒÇÁÆ®¿þ¾î ¹× ÀÀ¿ë
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
ºÐ·ù Á¤º¸¸¦ ÀÌ¿ëÇÑ ´Ü¾î ÀÇ¹Ì ÁßÀǼº ÇØ°á |
¿µ¹®Á¦¸ñ(English Title) |
Word Sense Disambiguation using Classification Information |
ÀúÀÚ(Author) |
ÀÌÈ£
¹é´ëÈ£
ÀÓÇØâ
Ho Lee
Daeho Baek
Haechang Rim
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 24 NO. 07 PP. 0779 ~ 0789 (1997. 07) |
Çѱ۳»¿ë (Korean Abstract) |
´Ü¾î ÀÇ¹Ì ÁßÀǼº ÇØ°á(word sense disambiguation)Àº ¹®¸Æ ³»¿¡¼ ÁÖ¾îÁø ´Ü¾îÀÇ Á¤È®ÇÑ Àǹ̸¦ ÆǺ°ÇÏ´Â ÀÛ¾÷À¸·Î ´Ü¾î ÀǹÌÀÇ Á¤È®ÇÑ ÆǺ°Àº ±â°è ¹ø¿ªÀ̳ª Á¤º¸ °Ë»ö ½Ã½ºÅÛÀÇ ¼º´É Çâ»ó¿¡ µµ¿òÀ» ÁØ´Ù. º» ³í¹®¿¡¼´Â ´Ü¾î ÀÇ¹Ì ÁßÀǼº ÇØ°á ¹®Á¦¸¦ µ¶¸³ÀûÀÎ ÀÌÁø Ư¡(independent binary feature)À» ÀÌ¿ëÇÑ ºÐ·ù ¹®Á¦(classification problem)·Î °£ÁÖÇÏ¿© ÇØ°áÇϱâ À§ÇØ ShannonÀÇ Á¤º¸ À̷п¡ ±â¹ÝÇÑ ºÐ·ù Á¤º¸(classification information)¶ó´Â »õ·Î¿î °³³äÀ» Á¤ÀÇÇÑ´Ù. ºÐ·ù Á¤º¸´Â ÃÖÀû ºÎ·ù(most probable class)¿Í ºÐº°°ª(discrimination score)À¸·Î ±¸¼ºµÈ´Ù. ÃÖÀû ºÎ·ù´Â ÁÖ¾îÁø ÀÌÁø Ư¡°ú °¡Àå ¹ÐÁ¢ÇÑ °ü·ÃÀÌ ÀÖ´Â ºÎ·ù¸¦ ³ªÅ¸³»¸ç ºÐº°°ªÀº ±× Ư¡°ú ÃÖÀû ºÎ·ù¿Í °ü·Ã Á¤µµ¸¦ ÀǹÌÇÑ´Ù. »õ·Î¿î ÆÐÅÏÀÌ ÀԷµǾúÀ» ¶§ ÆÐÅÏÀ¸·ÎºÎÅÍ ÃßÃâÇÑ Æ¯Â¡µéÀÇ ºÐ·ù Á¤º¸¿Í °¡Àå ¹ÐÁ¢ÇÑ °ü·ÃÀ» °¡Áø ºÎ·ù°¡ ±× ÆÐÅÏÀÇ ºÎ·ù·Î °áÁ¤µÈ´Ù. ºÐ·ù Á¤º¸¸¦ ÀÌ¿ëÇÏ¿© ´Ü¾î ÀÇ¹Ì ÁßÀǼºÀ» ÇØ°áÇÏ·Á¸é, ´ÙÀǾ Ã⿬ÇÑ ¹®ÀåÀº ÆÐÅÏÀÌ µÇ°í, ±× ¹®Àå ³»ÀÇ ´Ü¾îµéÀº Ư¡ÀÌ µÈ´Ù. ´ÙÀǾ Æ÷ÇÔµÈ ¹®ÀåÀÌ ÀÔ·ÂµÇ¸é ±× ¹®Àå¿¡ ÃâÇöÇÑ ´Ü¾îÀÇ ºÐº°°ªÀº ÃÖÀû ºÎ·ùº°·Î ÇÕ»êÀÌ µÇ¸ç, ±× °á°ú °¡Àå ³ôÀº ºÐº°°ªÀ» °¡Áö´Â ºÎ·ù¿¡ ÇØ´çµÇ´Â Àǹ̰¡ ±× ¹®Àå ³»¿¡¼ ´ÙÀǾîÀÇ Àǹ̰¡ µÈ´Ù. Á¦¾ÈÇÑ ¹æ¹ýÀ» ½ÇÇèÇϱâ À§ÇØ, Çѱ¹¾î¿¡ ´ëÇؼ´Â 4 °³ÀÇ ´ÙÀǾîÀÇ ¿ë·Ê¸¦ ÃßÃâÇÑ ´ÙÀ½ Á߽ɾ ´ëÇؼ ¼öÀÛ¾÷À¸·Î ÀÇ¹Ì Å±ëÀ» ¼öÇàÇÏ¿© ÇнÀ ¹× ½ÇÇè ÀÚ·á ÁýÇÕÀ¸·Î ÀÌ¿ëÇÏ¿´À¸¸ç, ¿µ¾î¿¡ ´ëÇؼ´Â ±âÁ¸ ¸î ¸î ¿¬±¸¿¡¼ »ç¿ëÇÏ¿´´ø °øÅëÀÇ ÀÚ·á ÁýÇÕÀ» ÀÌ¿ëÇÏ¿´´Ù. À̵é ÀÚ·á Áý¤¸ÇÕ¿¡ ´ëÇØ º» ³í¹®¿¡¼ Á¦¾ÈÇÑ ´Ü¾î ÀÇ¹Ì ÁßÀǼº ÇØ°á ±â¹ýÀ» Àû¿ëÇÑ °á°ú Çѱ¹¾î¿¡ ´ëÇؼ´Â Æò±Õ 84.6%, ¿µ¾î¿¡ ´ëÇؼ´Â 80.0%ÀÇ Á¤È®µµ¸¦ ³ªÅ¸³»¾ú´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
The task of word sense disambiguation is to identify the correct sense of a word in context. Improvement in the accuracy of identifying the correct word sense will contribute to a machine translation system and an information retrieval system. In this paper, we regard word sense disambiguation problem as a mater of classification problem in which independent binary features are used. For this research, we define a new notion, classification information, based on the Shannon's information theory.
The classification information consists of the most probable class(MPC) and the discrimination score(DS). The MPC of a feature represents the most closely related class with the feature, and the DS represents the degree of correlation between the MPC and the feature. When a new pattern is given, the pattern is classified into the most closely related class based on the classification information of features extracted from the pattern. When we try to solve word sense ambiguities by using classification information, we regard a sentence containing a polysemous word as a pattern and surrounding words in the sentence as features. When a new sentence including a polysemous word is given, the DS of every word surrounding the polysemous word is accumulated to each sense according to the MPC of the word, and the sense of the given polysemous word is determined to be the sense with the maximum DS value.
In order to test the proposed method, we use tow different sets of data. The first data set contains concordances of four Korean polysemous words whose keywords are manually sense-tagged. The other data set is the English data set which has been commonly used in several previous researches. Experimental results show that the average accuracy of proposed method is 84.6% for the Korean data set, and 80.0% for the English data set.
|
Å°¿öµå(Keyword) |
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|