Á¤º¸°úÇÐȸ ³í¹®Áö B : ¼ÒÇÁÆ®¿þ¾î ¹× ÀÀ¿ë
Current Result Document : 20 / 20
ÀÌÀü°Ç
ÇѱÛÁ¦¸ñ(Korean Title) |
º£ÀÌÁö¾ð ¹®¼ºÐ·ù½Ã½ºÅÛÀ» À§ÇÑ ´Éµ¿Àû ÇнÀ ±â¹ÝÀÇ ÇнÀ¹®¼ÁýÇÕ ±¸¼º¹æ¹ý |
¿µ¹®Á¦¸ñ(English Title) |
An Active Learning-based Method for Composing Training Document Set in Bayesian Text Classification Systems |
ÀúÀÚ(Author) |
±èÁ¦¿í
±èÇÑÁØ
ÀÌ»ó±¸
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 29 NO. 12 PP. 0966 ~ 0978 (2002. 12) |
Çѱ۳»¿ë (Korean Abstract) |
±â°èÇнÀ ±â¹ýÀ» ÀÌ¿ëÇÑ ¹®¼ºÐ·ù½Ã½ºÅÛÀÇ Á¤È®µµ¸¦ °áÁ¤ÇÏ´Â ¿äÀÎ Áß °¡Àå Áß¿äÇÑ °ÍÀº ÇнÀ¹®¼ÁýÇÕÀÇ ¼±Åðú ±×°ÍÀÇ ±¸¼º¹æ¹ýÀÌ´Ù. ÇнÀ¹®¼ÁýÇÕ ¼±ÅÃÀÇ ¹®Á¦¶õ ÀÓÀÇÀÇ ¹®¼°ø°£¿¡¼ º¸´Ù Á¤º¸·®ÀÌ Å« ÀûÀº ¾çÀÇ ¹®¼ÁýÇÕÀ» °ñ¶ó¼ ÇнÀ¹®¼·Î äÅÃÇÏ´Â °ÍÀ» ¸»ÇÑ´Ù. ÀÌ·¸°Ô ¼±ÅÃÇÑ ÇнÀ¹®¼ÁýÇÕÀ» À籸¼ºÇÏ¿© º¸´Ù Á¤È®µµ°¡ ³ôÀº ¹®¼ºÐ·ùÇÔ¼ö¸¦ ¸¸µå´Â °ÍÀÌ ÇнÀ¹®¼ÁýÇÕ ±¸¼º¹æ¹ýÀÇ ¹®Á¦ÀÌ´Ù. ÀüÀÚÀÇ ¹®Á¦¸¦ ÇØ°áÇÏ´Â ´ëÇ¥ÀûÀÎ ¾Ë°í¸®ÁòÀÌ ´Éµ¿Àû ÇнÀ(active learning) ¾Ë°í¸®ÁòÀÌ°í, ÈÄÀÚÀÇ °æ¿ì´Â ºÎ½ºÆÃ(boosting) ¾Ë°í¸®ÁòÀÌ´Ù.
º» ³í¹®¿¡¼´Â ÀÌ µÎ ¾Ë°í¸®ÁòÀ» Na ve Bayes ¹®¼ºÐ·ù ¾Ë°í¸®Áò¿¡ Àû¿ëÇغ¸°í, À̶§ »ý±â´Â ¿©·¯ °¡Áö Ư¡µéÀ» ºÐ¼®ÇÏ¿© »õ·Î¿î ÇнÀ¹®¼ÁýÇÕ ±¸¼º¹æ¹ýÀÎ AdaBUS ¾Ë°í¸®ÁòÀ» Á¦¾ÈÇÑ´Ù. ÀÌ ¾Ë°í¸®ÁòÀº ´Éµ¿Àû ÇнÀ ¾Ë°í¸®ÁòÀÇ ¾ÆÀ̵ð¾î¸¦ ÀÌ¿ëÇÏ¿© ÃÖÁ¾ ¹®¼ºÐ·ùÇÔ¼ö¸¦ ¸¸µé±â À§ÇØ Àӽ÷Π¸¸µç ¿©·¯ Àӽà ¹®¼ºÐ·ùÇÔ¼ö(weak hypothesis)µé °£ÀÇ º¯ÀÌ(variance)¸¦ ³ô¿´´Ù. À̸¦ ÅëÇØ ºÎ½ºÆà ¾Ë°í¸®ÁòÀÌ È¿°úÀûÀ¸·Î ±¸µ¿µÇ±â À§ÇØ ÇÊ¿äÇÑ ÇÙ½É °³³äÀÎ ±³¶õ(perturbation)ÀÇ È¿°ú¸¦ ½ÇÇöÇÏ¿© ¹®¼ºÐ·ùÀÇ Á¤È®µµ¸¦ ³ôÀÏ ¼ö ÀÖ¾ú´Ù. Reuter-21578 ¹®¼ÁýÇÕÀ» ÀÌ¿ëÇÑ °æÇèÀû ½ÇÇèÀ» ÅëÇØ, AdaBUS ¾Ë°í¸®ÁòÀÌ ±âÁ¸ÀÇ ¾Ë°í¸®Áò¿¡ ºñÇØ Na ve Bayes ¾Ë°í¸®Áò¿¡ ±â¹ÝÇÑ ¹®¼ºÐ·ù½Ã½ºÅÛÀÇ Á¤È®µµ¸¦ º¸´Ù Å©°Ô Çâ»ó½ÃŲ´Ù´Â »ç½ÇÀ» ÀÔÁõÇÑ´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
There are two important problems in improving text classification systems based on machine learning approach. The first one, called ´selection problem´, is how to select a minimum number of informative documents from a given document collection. The second one, called ´composition problem´, is how to reorganize selected training documents so that they can fit an adopted learning method. The former problem is addressed in ´active learning´ algorithms, and the latter is discussed in ´boosting´ algorithms.
This paper proposes a new learning method, called AdaBUS, which proactively solves the above problems in the context of Na ve Bayes classification systems. The proposed method constructs more accurate classification hypothesis by increasing the variance in ´weak´ hypotheses that determine the final classification hypothesis. Consequently, the proposed algorithm yields perturbation effect makes the boosting algorithm work properly. Through the empirical experiment using the Reuters-21578 document collection, we show that the AdaBUS algorithm more significantly improves the Na ve Bayes-based classification system than other conventional learning methods |
Å°¿öµå(Keyword) |
2002 Á¤º¸°úÇÐ ³í¹®°æÁø´ëȸ ¼ö»óÀÛ
ÇнÀ¹®¼ÁýÇÕ ±¸¼º¹æ¹ý
Na ve Bayes ¹®¼ºÐ·ù ¾Ë°í¸®Áò
ºÎ½ºÆà ¾Ë°í¸®Áò
ºÒÈ®½Ç¼º ±â¹Ý »ùÇøµ ¾Ë°í¸®Áò
AdaBUS ¾Ë°í¸®Áò
composing train document set
Na ve Bayes text classifier
boosting algorithm
uncertainty-based sampling algorithm
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|