• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ



Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document : 3 / 4

ÇѱÛÁ¦¸ñ(Korean Title) ÀÚ¿øºÎÁ· ȯ°æ¿¡ ÀûÇÕÇÑ BIT °³Ã¼¸í Ç¥±â¹ý
¿µ¹®Á¦¸ñ(English Title) A BIT Named Entity Format Suitable for Low Resource Environments
ÀúÀÚ(Author) À± È£   ±èâÇö   õ¹Î¾Æ   ¹ÚÈ£¹Î   ³²±Ã¿µ   Ãֹμ®   ±èÀç±Õ   ±èÀçÈÆ   Ho Yoon   Chang-Hyun Kim   Min-ah Cheon   Ho-min Park   Young Namgoong   Min-seok Choi   Jae-kyun Kim   Jae-Hoon Kim  
¿ø¹®¼ö·Ïó(Citation) VOL 48 NO. 03 PP. 0293 ~ 0301 (2021. 03)
(Korean Abstract)
°³Ã¼¸í ÀνÄÀ̶õ ÁÖ¾îÁø ¹®¼­¿¡¼­ °³Ã¼¸íÀÇ ¹üÀ§¸¦ ã°í °³Ã¼¸íÀ» ºÐ·ùÇÏ´Â °ÍÀÌ´Ù. ¸¹Àº °³Ã¼¸íÀº Çϳª ÀÌ»óÀÇ ´Ü¾î·Î ±¸¼ºµÇ¹Ç·Î ´ëºÎºÐÀÇ °³Ã¼¸í ÇнÀ¸»¹¶Ä¡´Â BIO Ç¥±â¹ýÀ¸·Î Ç¥ÇöµÈ´Ù. BIO Ç¥±â¹ýÀº °³Ã¼¸íÀÌ ½ÃÀ۵Ǵ ´Ü¾îÀÇ Ç¥Áö¿¡ ¡°B-¡±¸¦ ºÙÀÌ°í, °³Ã¼¸í¿¡ Æ÷ÇÔµÈ ±× ¿ÜÀÇ ´Ü¾îÀÇ Ç¥Áö¿¡´Â ¡°I-¡±¸¦ ºÙÀ̸ç, °³Ã¼¸í°ú °³Ã¼¸í »çÀÌÀÇ ¸ðµç ´Ü¾îÀÇ Ç¥Áö¸¦ ¡°O¡±·Î °£ÁÖÇÏ´Â ¹æ¹ýÀÌ´Ù. ÀÌ ¹æ¹ýÀº ¾à 90% ÀÌ»óÀÇ ´Ü¾î°¡ ¡°O¡± Ç¥Áö¸¦ °¡Áö¹Ç·Î ¡°O¡± Ç¥Áö¿¡ ´ëÇÑ È¥Àâµµ°¡ ³ô¾ÆÁö´Â ¹®Á¦¿Í ºÒ±ÕÇüÇнÀ ¹®Á¦°¡ ¾ß±âµÈ´Ù. º» ³í¹®¿¡¼­´Â BIO Ç¥±â¹ý ´ë½Å¿¡ BIT Ç¥±â¹ýÀ» Á¦¾ÈÇÑ´Ù. BIT Ç¥±â¹ýÀ̶õ BIO Ç¥±â¹ý¿¡¼­ ¡°O¡± Ç¥Áö¸¦ ¡°T¡± Ç¥Áö·Î º¯È¯ÇÏ´Â ¹æ¹ýÀÌ¸ç º» ³í¹®¿¡¼­ ¡°T¡± Ç¥Áö´Â Ç°»ç Ç¥Áö¸¦ ³ªÅ¸³½´Ù. ½ÇÇèÀ» ÅëÇؼ­ ´Ü¾î Ç¥»óÀÇ ÀÇ¹Ì Åõ¿µµµ°¡ ³ôÁö ¾ÊÀ» °æ¿ì, Áï »ó´ëÀûÀ¸·Î ÀûÀº ¾çÀÇ ÇнÀÀÚ·á·Î ´Ü¾î Ç¥»óÀ» ÇнÀÇßÀ» °æ¿ì¿¡´Â BIT Ç¥±â¹ýÀÌ BIO Ç¥±â¹ýº¸´Ù ÁÁÀº ¼º´ÉÀ» º¸¿´´Ù.
(English Abstract)
Named entity recognition (NER) seeks to locate and classify named entities into predefined categories such as person names, organization, location, and others. Most name entities consist of more than one word and so the multitude of annotated corpora for NER are encoded by the BIO (short for Beginning, Inside, and Outside) format: A ¡°B-¡± prefix before a tag indicates that the tag is the beginning of a named entity, and an ¡°I-¡± prefix before a tag indicates that the tag is inside the named entity. An ¡°O¡± tag indicates that a word belongs to no named entity. In this format, words with ¡°O¡± tags in the corpora amount to more than about 90% of the words and thus, can cause two problems: the high perplexity of words with ¡°O¡± tags and imbalance learning. In this paper, we propose a novel format to represent the NER corpus called the BIT format, which uses ¡°T (short for POS Tags)¡± tags in place of ¡°O¡± tags. Experiments have shown that the BIT format outperforms the BIO format when the meaning projection of the word representation is unreliable, namely, when word embedding is trained through a relatively small number of words.
Å°¿öµå(Keyword) BIT Ç¥±â¹ý   °³Ã¼¸í ÀνĠ  Bi-LSTM/CRF   BIO Ç¥±â¹ý   BIT format   named entity recognition   Bi-LSTM/CRF   BIO format  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå