• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ´ë¿ë·® Ç¥ÁØ ¸»¹¶Ä¡ ±¸ÃàÀ» À§ÇÑ ´Ù¼ö ÇüÅÂ¼Ò ºÐ¼® °á°ú ÅëÇÕ ¹æ¹ý·Ð
¿µ¹®Á¦¸ñ(English Title) Unified Methodology of Multiple POS Taggers for Large-scale Korean Linguistic GS Set Construction
ÀúÀÚ(Author) ±èÅ¿µ   ·ù¹ý¸ð   ±èÇÑ»ù   ¿ÀÈ¿Á¤   Tae-Young Kim   Pum-Mo Ryu   Hansaem Kim   Hyo-Jung Oh  
¿ø¹®¼ö·Ïó(Citation) VOL 47 NO. 06 PP. 0596 ~ 0602 (2020. 06)
Çѱ۳»¿ë
(Korean Abstract)
ÃÖ±Ù Çѱ¹¾î Á¤º¸Ã³¸®¸¦ À§ÇÑ ´ë¿ë·® ¾ð¾îºÐ¼® Ç¥ÁØ ¸»¹¶Ä¡(GS: Gold Standard Set)¸¦ ±¸ÃàÇÏ°í, À̸¦ °øÀ¯ È®»êÇϱâ À§ÇÑ ±¹°¡Â÷¿øÀÇ Áö¿øÀÌ ÀÌ·ïÁö°í ÀÖ´Ù. º» ¿¬±¸´Â ÀÌ·¯ÇÑ ¸»¹¶Ä¡ ±¸Ãà »ç¾÷ÀÇ ÀÏȯÀ¸·Î, ÇöÀç ±¹³»¿¡¼­ °³¹ßµÈ ´Ù¾çÇÑ Çѱ¹¾î ¾ð¾îºÐ¼® ¸ðµâÀ» È°¿ëÇÏ¿© °øÅë Á¤´ä¼Â ±¸ÃàÀ» À§ÇÑ ¹æ¹ý·ÐÀ» Á¦¾ÈÇÏ°íÀÚ ÇÑ´Ù. ƯÈ÷, ´ë·®ÀÇ ÇнÀ¼ÂÀ» ±¸ÃàÇϱâ À§ÇØ ´Ù¼öÀÇ ¸ðµâ(N-modules)·ÎºÎÅÍ Á¦½ÃµÈ Èĺ¸ Á¤´äÀ» ÂüÁ¶, ¿À·ù ÇüŸ¦ ºÐ·ùÇÏ¿© ÁÖ¿ä À¯ÇüÀ» ¹ÝÀÚµ¿À¸·Î º¸Á¤ÇÔÀ¸·Î½á ¼öÀÛ¾÷À» ÃÖ¼ÒÈ­ÇÏ¿´´Ù. º» ¿¬±¸¿¡¼­´Â ÇüÅÂ¼Ò ºÐ¼® ¸ðµâ Àû¿ë °á°ú¸¦ Á¤±ÔÈ­ÇÏ¿© ÅëÇÕ Æ÷¸ËÀÎ U-POS¸¦ ±â¹ÝÀ¸·Î ´ë¿ë·® Çѱ¹¾î ¾ð¾î ºÐ¼® Ç¥ÁØ ¸»¹¶Ä¡¸¦ ±¸ÃàÇÏ¿´´Ù. º» ¿¬±¸¸¦ ÅëÇØ 348,229 ¹®Àå, ÃÑ 9,455,930 ¾îÀýÀÌ Çѱ¹¾î Ç¥ÁØ ¸»¹¶Ä¡·Î ±¸ÃàµÇ¾úÀ¸¸ç, ÀÌ´Â Â÷ÈÄ¿¡ Çѱ¹¾î Á¤º¸Ã³¸®¸¦ À§ÇÑ ±âÃÊ ÇнÀÀÚ¿øÀ¸·Î È°¿ëµÉ ¼ö ÀÖ´Ù
¿µ¹®³»¿ë
(English Abstract)
In recent years, there has been national support for constructing, sharing, and spreading a large-scale Korean linguistic GS set for Korean information processing. As part of the corpus construction project, this study proposes the methodology for constructing the Korean linguistic GS set using various Korean language analysis modules developed in Korea. To build a large-scale training set, we referred to automatic tagged candidate answers from the N-modules. We then minimized manual effort by classifying the error types from the candidate responses and semi- automatically correcting the major error types. In this study, we normalized results of the morphological analysis and constructed a large-scale Korean linguistic GS set based on the unified format U-POS. As a result of this study, 348,229 sentences, a total of 9,455,930 words, were constructed as the Korean linguistic GS set. This can be practically applied later as a basic training resource for Korean information processing.
Å°¿öµå(Keyword) Çѱ¹¾î ÄÚÆÛ½º   ÇüÅÂ¼Ò ºÐ¼®   Ç°»ç ű렠 ¹ÝÀÚµ¿ ±¸Ãà   Korean corpus   morphological analysis   POS tagging   semi-automatic construction  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå